[llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)
Joel E. Denny via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 15 09:18:50 PDT 2025
https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/128785
>From f4135207e955f6c2e358cad54a7ef6f2f18087f8 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Wed, 19 Mar 2025 16:19:40 -0400
Subject: [PATCH 01/27] [LoopPeel] Fix branch weights' effect on block
frequencies
For example:
```
declare void @f(i32)
define void @test(i32 %n) {
entry:
br label %do.body
do.body:
%i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
%inc = add i32 %i, 1
call void @f(i32 %i)
%c = icmp sge i32 %inc, %n
br i1 %c, label %do.end, label %do.body, !prof !0
do.end:
ret void
}
!0 = !{!"branch_weights", i32 1, i32 9}
```
Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
1/(1+9). That is, the loop is likely to exit every 10 iterations and
thus has an estimated trip count of 10. `opt
-passes='print<block-freq>'` shows that 10 is indeed the frequency of
the loop body:
```
Printing analysis results of BFI for function 'test':
block-frequency-info: test
- entry: float = 1.0, int = 1801439852625920
- do.body: float = 10.0, int = 18014398509481984
- do.end: float = 1.0, int = 1801439852625920
```
Key Observation: The frequency of reaching any particular iteration is
less than for the previous iteration because the previous iteration
has a non-zero probability of exiting the loop. This observation
holds even though every loop iteration, once actually reached, has
exactly the same probability of exiting and thus exactly the same
branch weights.
Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to
peel 2 iterations and insert them before the remaining loop. We
expect the key observation above not to change, but it does under the
implementation without this patch. The block frequency becomes 1.0
for the first iteration, 0.9 for the second, and 6.4 for the main loop
body. Again, a decreasing frequency is expected, but it decreases too
much: the total frequency of the original loop body becomes 8.3. The
new branch weights reveal the problem:
```
!0 = !{!"branch_weights", i32 1, i32 9}
!1 = !{!"branch_weights", i32 1, i32 8}
!2 = !{!"branch_weights", i32 1, i32 7}
```
The exit probability is now 1/10 for the first peeled iteration, 1/9
for the second, and 1/8 for the remaining loop iterations. It seems
this behavior is trying to ensure a decreasing block frequency.
However, as in the key observation above for the original loop, that
happens correctly without decreasing the branch weights across
iterations.
This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the frequency for every
iteration is the same as it was in the original loop. The total
frequency of the loop body, summed across all its occurrences, thus
remains 10 after peeling.
Unfortunately, that change means a later analysis cannot accurately
estimate the trip count of the remaining loop while examining the
remaining loop in isolation without considering the probability of
actually reaching it. For that purpose, this patch stores the new
trip count as separate metadata named `llvm.loop.estimated_trip_count`
and extends `llvm::getLoopEstimatedTripCount` to prefer it, if
present, over branch weights.
An alternative fix is for `llvm::getLoopEstimatedTripCount` to
subtract the `llvm.loop.peeled.count` metadata from the trip count
estimated by a loop's branch weights. However, there might be other
loop transformations that still corrupt block frequencies in a similar
manner and require a similar fix. `llvm.loop.estimated_trip_count` is
intended to provide a general way to store estimated trip counts when
branch weights cannot directly store them.
This patch introduces several fixme comments that need to be addressed
before it can land.
---
.../include/llvm/Transforms/Utils/LoopUtils.h | 25 ++-
llvm/lib/Transforms/Utils/LoopPeel.cpp | 145 +++++++-----------
llvm/lib/Transforms/Utils/LoopUtils.cpp | 20 ++-
.../LoopUnroll/peel-branch-weights-freq.ll | 75 +++++++++
.../LoopUnroll/peel-branch-weights.ll | 64 ++++----
.../LoopUnroll/peel-loop-pgo-deopt.ll | 11 +-
.../Transforms/LoopUnroll/peel-loop-pgo.ll | 13 +-
.../Transforms/LoopVectorize/X86/pr81872.ll | 18 ++-
8 files changed, 217 insertions(+), 154 deletions(-)
create mode 100644 llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 8f4c0c88336ac..82d23a4b68ea1 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -315,7 +315,8 @@ TransformationMode hasLICMVersioningTransformation(const Loop *L);
void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
unsigned V = 0);
-/// Returns a loop's estimated trip count based on branch weight metadata.
+/// Returns a loop's estimated trip count based on
+/// llvm.loop.estimated_trip_count metadata or, if none, branch weight metadata.
/// In addition if \p EstimatedLoopInvocationWeight is not null it is
/// initialized with weight of loop's latch leading to the exit.
/// Returns a valid positive trip count, saturated at UINT_MAX, or std::nullopt
@@ -324,13 +325,21 @@ std::optional<unsigned>
getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight = nullptr);
-/// Set a loop's branch weight metadata to reflect that loop has \p
-/// EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight exits
-/// through latch. Returns true if metadata is successfully updated, false
-/// otherwise. Note that loop must have a latch block which controls loop exit
-/// in order to succeed.
-bool setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
- unsigned EstimatedLoopInvocationWeight);
+/// Set a loop's llvm.loop.estimated_trip_count metadata and, if \p
+/// EstimatedLoopInvocationWeight, branch weight metadata to reflect that loop
+/// has \p EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight
+/// exit weight through latch. Returns true if metadata is successfully updated,
+/// false otherwise. Note that loop must have a latch block which controls loop
+/// exit in order to succeed.
+///
+/// The use case for not setting branch weight metadata is when the original
+/// branch weight metadata is correct for computing block frequencies but the
+/// trip count has changed due to a loop transformation. The branch weight
+/// metadata cannot be adjusted to reflect the new trip count, so we store the
+/// new trip count separately.
+bool setLoopEstimatedTripCount(
+ Loop *L, unsigned EstimatedTripCount,
+ std::optional<unsigned> EstimatedLoopInvocationWeight);
/// Check inner loop (L) backedge count is known to be invariant on all
/// iterations of its outer loop. If the loop has no parent, this is trivially
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 0f3a92b65686c..f9be521b333c3 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -655,84 +655,6 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
}
}
-struct WeightInfo {
- // Weights for current iteration.
- SmallVector<uint32_t> Weights;
- // Weights to subtract after each iteration.
- const SmallVector<uint32_t> SubWeights;
-};
-
-/// Update the branch weights of an exiting block of a peeled-off loop
-/// iteration.
-/// Let F is a weight of the edge to continue (fallthrough) into the loop.
-/// Let E is a weight of the edge to an exit.
-/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to
-/// go to exit.
-/// Then, Estimated ExitCount = F / E.
-/// For I-th (counting from 0) peeled off iteration we set the weights for
-/// the peeled exit as (EC - I, 1). It gives us reasonable distribution,
-/// The probability to go to exit 1/(EC-I) increases. At the same time
-/// the estimated exit count in the remainder loop reduces by I.
-/// To avoid dealing with division rounding we can just multiple both part
-/// of weights to E and use weight as (F - I * E, E).
-static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
- setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
- for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
- if (SubWeight != 0)
- // Don't set the probability of taking the edge from latch to loop header
- // to less than 1:1 ratio (meaning Weight should not be lower than
- // SubWeight), as this could significantly reduce the loop's hotness,
- // which would be incorrect in the case of underestimating the trip count.
- Info.Weights[Idx] =
- Info.Weights[Idx] > SubWeight
- ? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
- : SubWeight;
-}
-
-/// Initialize the weights for all exiting blocks.
-static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
- Loop *L) {
- SmallVector<BasicBlock *> ExitingBlocks;
- L->getExitingBlocks(ExitingBlocks);
- for (BasicBlock *ExitingBlock : ExitingBlocks) {
- Instruction *Term = ExitingBlock->getTerminator();
- SmallVector<uint32_t> Weights;
- if (!extractBranchWeights(*Term, Weights))
- continue;
-
- // See the comment on updateBranchWeights() for an explanation of what we
- // do here.
- uint32_t FallThroughWeights = 0;
- uint32_t ExitWeights = 0;
- for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
- if (L->contains(Succ))
- FallThroughWeights += Weight;
- else
- ExitWeights += Weight;
- }
-
- // Don't try to update weights for degenerate case.
- if (FallThroughWeights == 0)
- continue;
-
- SmallVector<uint32_t> SubWeights;
- for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
- if (!L->contains(Succ)) {
- // Exit weights stay the same.
- SubWeights.push_back(0);
- continue;
- }
-
- // Subtract exit weights on each iteration, distributed across all
- // fallthrough edges.
- double W = (double)Weight / (double)FallThroughWeights;
- SubWeights.push_back((uint32_t)(ExitWeights * W));
- }
-
- WeightInfos.insert({Term, {std::move(Weights), std::move(SubWeights)}});
- }
-}
-
/// Clones the body of the loop L, putting it between \p InsertTop and \p
/// InsertBot.
/// \param IterNumber The serial number of the iteration currently being
@@ -1006,11 +928,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
Instruction *LatchTerm =
cast<Instruction>(cast<BasicBlock>(Latch)->getTerminator());
- // If we have branch weight information, we'll want to update it for the
- // newly created branches.
- DenseMap<Instruction *, WeightInfo> Weights;
- initBranchWeights(Weights, L);
-
// Identify what noalias metadata is inside the loop: if it is inside the
// loop, the associated metadata must be cloned for each iteration.
SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
@@ -1038,11 +955,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
assert(DT.verify(DominatorTree::VerificationLevel::Fast));
#endif
- for (auto &[Term, Info] : Weights) {
- auto *TermCopy = cast<Instruction>(VMap[Term]);
- updateBranchWeights(TermCopy, Info);
- }
-
// Remove Loop metadata from the latch branch instruction
// because it is not the Loop's latch branch anymore.
auto *LatchTermCopy = cast<Instruction>(VMap[LatchTerm]);
@@ -1068,15 +980,62 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
PHI->setIncomingValueForBlock(NewPreHeader, NewVal);
}
- for (const auto &[Term, Info] : Weights) {
- setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
- }
-
// Update Metadata for count of peeled off iterations.
unsigned AlreadyPeeled = 0;
if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))
AlreadyPeeled = *Peeled;
- addStringMetadataToLoop(L, PeeledCountMetaData, AlreadyPeeled + PeelCount);
+ unsigned TotalPeeled = AlreadyPeeled + PeelCount;
+ addStringMetadataToLoop(L, PeeledCountMetaData, TotalPeeled);
+
+ // Update metadata for the estimated trip count. The original branch weight
+ // metadata is already correct for both the remaining loop and the peeled loop
+ // iterations, so don't adjust it.
+ //
+ // For example, consider what happens when peeling 2 iterations from a loop
+ // with an estimated trip count of 10 and inserting them before the remaining
+ // loop. Each of the peeled iterations and each iteration in the remaining
+ // loop still has the same probability of exiting the *entire original* loop
+ // as it did when in the original loop, and thus it should still have the same
+ // branch weights. The peeled iterations' non-zero probabilities of exiting
+ // already appropriately reduce the probability of reaching the remaining
+ // iterations just as they did in the original loop. Trying to also adjust
+ // the remaining loop's branch weights to reflect its new trip count of 8 will
+ // erroneously further reduce its block frequencies. However, in case an
+ // analysis later needs to determine the trip count of the remaining loop
+ // while examining it in isolation without considering the probability of
+ // actually reaching it, we store the new trip count as separate metadata.
+ //
+ // FIXME: getLoopEstimatedTripCount and setLoopEstimatedTripCount skip loops
+ // that don't match the restrictions of getExpectedExitLoopLatchBranch in
+ // LoopUtils.cpp. For example,
+ // llvm/tests/Transforms/LoopUnroll/peel-branch-weights.ll (introduced by
+ // b43a4d0850d5) has multiple exits. Should we try to extend them to handle
+ // such cases? For now, we just don't try to record
+ // llvm.loop.estimated_trip_count for such cases, so the original branch
+ // weights will have to do.
+ if (auto EstimatedTripCount = getLoopEstimatedTripCount(L)) {
+ // FIXME: The previous updateBranchWeights implementation had this
+ // comment:
+ //
+ // Don't set the probability of taking the edge from latch to loop header
+ // to less than 1:1 ratio (meaning Weight should not be lower than
+ // SubWeight), as this could significantly reduce the loop's hotness,
+ // which would be incorrect in the case of underestimating the trip count.
+ //
+ // See e8d5db206c2f commit log for further discussion. That seems to
+ // suggest that we should avoid ever setting a trip count of < 2 here
+ // (equal chance of continuing and exiting means the loop will likely
+ // continue once and then exit once). Or is keeping the original branch
+ // weights already a sufficient improvement for whatever analysis cares
+ // about this case?
+ unsigned EstimatedTripCountNew = *EstimatedTripCount;
+ if (EstimatedTripCountNew < TotalPeeled) // FIXME: TotalPeeled + 2?
+ EstimatedTripCountNew = 0; // FIXME: = 2?
+ else
+ EstimatedTripCountNew -= TotalPeeled;
+ setLoopEstimatedTripCount(L, EstimatedTripCountNew,
+ /*EstimatedLoopInvocationWeight=*/std::nullopt);
+ }
if (Loop *ParentLoop = L->getParentLoop())
L = ParentLoop;
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 84c08556f8a25..ae91dbd5bf902 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -53,6 +53,8 @@ using namespace llvm::PatternMatch;
static const char *LLVMLoopDisableNonforced = "llvm.loop.disable_nonforced";
static const char *LLVMLoopDisableLICM = "llvm.licm.disable";
+static const char *LLVMLoopEstimatedTripCount =
+ "llvm.loop.estimated_trip_count";
bool llvm::formDedicatedExitBlocks(Loop *L, DominatorTree *DT, LoopInfo *LI,
MemorySSAUpdater *MSSAU,
@@ -864,14 +866,22 @@ llvm::getLoopEstimatedTripCount(Loop *L,
getEstimatedTripCount(LatchBranch, L, ExitWeight)) {
if (EstimatedLoopInvocationWeight)
*EstimatedLoopInvocationWeight = ExitWeight;
+ // FIXME: Where else are branch weights directly used for estimating loop
+ // trip counts? They should also be updated to use
+ // LLVMLoopEstimatedTripCount when present... or to just call this
+ // function.
+ if (auto EstimatedTripCount =
+ getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount))
+ return EstimatedTripCount;
return *EstTripCount;
}
}
return std::nullopt;
}
-bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
- unsigned EstimatedloopInvocationWeight) {
+bool llvm::setLoopEstimatedTripCount(
+ Loop *L, unsigned EstimatedTripCount,
+ std::optional<unsigned> EstimatedloopInvocationWeight) {
// At the moment, we currently support changing the estimate trip count of
// the latch branch only. We could extend this API to manipulate estimated
// trip counts for any exit.
@@ -879,12 +889,16 @@ bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
if (!LatchBranch)
return false;
+ addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount, EstimatedTripCount);
+ if (!EstimatedloopInvocationWeight)
+ return true;
+
// Calculate taken and exit weights.
unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;
if (EstimatedTripCount > 0) {
- LatchExitWeight = EstimatedloopInvocationWeight;
+ LatchExitWeight = *EstimatedloopInvocationWeight;
BackedgeTakenWeight = (EstimatedTripCount - 1) * LatchExitWeight;
}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll b/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
new file mode 100644
index 0000000000000..c566b8ad6ea37
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
@@ -0,0 +1,75 @@
+; Test branch weights metadata, estimated trip count metadata, and block
+; frequencies after loop peeling.
+
+; RUN: opt < %s -S -passes='print<block-freq>' 2>&1 | \
+; RUN: FileCheck -check-prefix=CHECK %s
+
+; The -implicit-check-not options make sure that no additional labels or calls
+; to @f show up.
+; RUN: opt < %s -S -passes='loop-unroll,print<block-freq>' \
+; RUN: -unroll-force-peel-count=2 2>&1 | \
+; RUN: FileCheck %s -check-prefix=CHECK-UR \
+; RUN: -implicit-check-not='{{^[^ ;]*:}}' \
+; RUN: -implicit-check-not='call void @f'
+
+; CHECK: block-frequency-info: test
+; CHECK: do.body: float = 10.0,
+
+; The sum should still be ~10.
+;
+; CHECK-UR: block-frequency-info: test
+; CHECK-UR: - [[DO_BODY_PEEL:.*]]: float = 1.0,
+; CHECK-UR: - [[DO_BODY_PEEL2:.*]]: float = 0.9,
+; CHECK-UR: - [[DO_BODY:.*]]: float = 8.1,
+
+declare void @f(i32)
+
+define void @test(i32 %n) {
+; CHECK-UR-LABEL: define void @test(
+; CHECK-UR: [[ENTRY:.*]]:
+; CHECK-UR: br label %[[DO_BODY_PEEL_BEGIN:.*]]
+; CHECK-UR: [[DO_BODY_PEEL_BEGIN]]:
+; CHECK-UR: br label %[[DO_BODY_PEEL:.*]]
+; CHECK-UR: [[DO_BODY_PEEL]]:
+; CHECK-UR: call void @f
+; CHECK-UR: br i1 %{{.*}}, label %[[DO_END:.*]], label %[[DO_BODY_PEEL_NEXT:.*]], !prof ![[#PROF:]]
+; CHECK-UR: [[DO_BODY_PEEL_NEXT]]:
+; CHECK-UR: br label %[[DO_BODY_PEEL2:.*]]
+; CHECK-UR: [[DO_BODY_PEEL2]]:
+; CHECK-UR: call void @f
+; CHECK-UR: br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_PEEL_NEXT1:.*]], !prof ![[#PROF]]
+; CHECK-UR: [[DO_BODY_PEEL_NEXT1]]:
+; CHECK-UR: br label %[[DO_BODY_PEEL_NEXT5:.*]]
+; CHECK-UR: [[DO_BODY_PEEL_NEXT5]]:
+; CHECK-UR: br label %[[ENTRY_PEEL_NEWPH:.*]]
+; CHECK-UR: [[ENTRY_PEEL_NEWPH]]:
+; CHECK-UR: br label %[[DO_BODY]]
+; CHECK-UR: [[DO_BODY]]:
+; CHECK-UR: call void @f
+; CHECK-UR: br i1 %{{.*}}, label %[[DO_END_LOOPEXIT:.*]], label %[[DO_BODY]], !prof ![[#PROF]], !llvm.loop ![[#LOOP_UR_LATCH:]]
+; CHECK-UR: [[DO_END_LOOPEXIT]]:
+; CHECK-UR: br label %[[DO_END]]
+; CHECK-UR: [[DO_END]]:
+; CHECK-UR: ret void
+
+entry:
+ br label %do.body
+
+do.body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
+ %inc = add i32 %i, 1
+ call void @f(i32 %i)
+ %c = icmp sge i32 %inc, %n
+ br i1 %c, label %do.end, label %do.body, !prof !0
+
+do.end:
+ ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 9}
+
+; CHECK-UR: ![[#PROF]] = !{!"branch_weights", i32 1, i32 9}
+; CHECK-UR: ![[#LOOP_UR_LATCH]] = distinct !{![[#LOOP_UR_LATCH]], ![[#LOOP_UR_PC:]], ![[#LOOP_UR_TC:]], ![[#DISABLE:]]}
+; CHECK-UR: ![[#LOOP_UR_PC]] = !{!"llvm.loop.peeled.count", i32 2}
+; CHECK-UR: ![[#LOOP_UR_TC]] = !{!"llvm.loop.estimated_trip_count", i32 8}
+; CHECK-UR: ![[#DISABLE]] = !{!"llvm.loop.unroll.disable"}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll b/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
index c58f8f1f4e4ee..63a0dd4b4b4f9 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
@@ -15,9 +15,9 @@ define void @test() {
; CHECK: loop.peel:
; CHECK-NEXT: [[X_PEEL:%.*]] = call i32 @get.x()
; CHECK-NEXT: switch i32 [[X_PEEL]], label [[LOOP_LATCH_PEEL:%.*]] [
-; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
-; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
-; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
+; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
+; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
+; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
; CHECK-NEXT: ], !prof [[PROF0:![0-9]+]]
; CHECK: loop.latch.peel:
; CHECK-NEXT: br label [[LOOP_PEEL_NEXT:%.*]]
@@ -26,10 +26,10 @@ define void @test() {
; CHECK: loop.peel2:
; CHECK-NEXT: [[X_PEEL3:%.*]] = call i32 @get.x()
; CHECK-NEXT: switch i32 [[X_PEEL3]], label [[LOOP_LATCH_PEEL4:%.*]] [
-; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
-; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
-; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
-; CHECK-NEXT: ], !prof [[PROF1:![0-9]+]]
+; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
+; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
+; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
+; CHECK-NEXT: ], !prof [[PROF0]]
; CHECK: loop.latch.peel4:
; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
; CHECK: loop.peel.next1:
@@ -41,31 +41,33 @@ define void @test() {
; CHECK: loop:
; CHECK-NEXT: [[X:%.*]] = call i32 @get.x()
; CHECK-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
-; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
-; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
-; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
-; CHECK-NEXT: ], !prof [[PROF2:![0-9]+]]
+; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
+; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
+; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
+; CHECK-NEXT: ], !prof [[PROF0]]
; CHECK: loop.latch:
-; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP1:![0-9]+]]
; CHECK: loop.exit.loopexit:
; CHECK-NEXT: br label [[LOOP_EXIT]]
; CHECK: loop.exit:
; CHECK-NEXT: ret void
+;
+; DISABLEADV-LABEL: @test(
+; DISABLEADV-NEXT: entry:
+; DISABLEADV-NEXT: br label [[LOOP:%.*]]
+; DISABLEADV: loop:
+; DISABLEADV-NEXT: [[X:%.*]] = call i32 @get.x()
+; DISABLEADV-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
+; DISABLEADV-NEXT: i32 0, label [[LOOP_LATCH]]
+; DISABLEADV-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
+; DISABLEADV-NEXT: i32 2, label [[LOOP_EXIT]]
+; DISABLEADV-NEXT: ], !prof [[PROF0:![0-9]+]]
+; DISABLEADV: loop.latch:
+; DISABLEADV-NEXT: br label [[LOOP]]
+; DISABLEADV: loop.exit:
+; DISABLEADV-NEXT: ret void
+;
-; DISABLEADV-LABEL: @test()
-; DISABLEADV-NEXT: entry:
-; DISABLEADV-NEXT: br label %loop
-; DISABLEADV: loop
-; DISABLEADV-NEXT: %x = call i32 @get.x()
-; DISABLEADV-NEXT: switch i32 %x, label %loop.latch [
-; DISABLEADV-NEXT: i32 0, label %loop.latch
-; DISABLEADV-NEXT: i32 1, label %loop.exit
-; DISABLEADV-NEXT: i32 2, label %loop.exit
-; DISABLEADV-NEXT: ], !prof !0
-; DISABLEADV: loop.latch:
-; DISABLEADV-NEXT: br label %loop
-; DISABLEADV: loop.exit:
-; DISABLEADV-NEXT: ret void
entry:
br label %loop
@@ -89,9 +91,9 @@ loop.exit:
;.
; CHECK: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
-; CHECK: [[PROF1]] = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
-; CHECK: [[PROF2]] = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
-; CHECK: [[LOOP3]] = distinct !{!3, !4, !5}
-; CHECK: [[META4:![0-9]+]] = !{!"llvm.loop.peeled.count", i32 2}
-; CHECK: [[META5:![0-9]+]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 2}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
+;.
+; DISABLEADV: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
;.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
index d91cb5bab3827..e95121593e4f7 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
@@ -15,13 +15,13 @@
; CHECK: br i1 %{{.*}}, label %[[NEXT0:.*]], label %for.cond.for.end_crit_edge, !prof !16
; CHECK: [[NEXT0]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit, !prof !15
-; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !17
+; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !16
; CHECK: [[NEXT1]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit, !prof !15
-; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !18
+; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !16
; CHECK: [[NEXT2]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit.loopexit, !prof !15
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !16, !llvm.loop !17
define i32 @basic(ptr %p, i32 %k, i1 %c) #0 !prof !15 {
entry:
@@ -84,6 +84,7 @@ attributes #1 = { nounwind optsize }
;CHECK: !15 = !{!"branch_weights", i32 1, i32 0}
; This is a weights of latch and its copies.
;CHECK: !16 = !{!"branch_weights", i32 3001, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 1001, i32 1001}
+;CHECK: !17 = distinct !{!17, !18, !19, {{.*}}}
+;CHECK: !18 = !{!"llvm.loop.peeled.count", i32 4}
+;CHECK: !19 = !{!"llvm.loop.estimated_trip_count", i32 0}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
index 15dce234baee9..dec126f289d32 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
@@ -5,7 +5,7 @@
; RUN: opt < %s -S -profile-summary-huge-working-set-size-threshold=9 -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s --check-prefix=NOPEEL
; REQUIRES: asserts
-; Make sure we use the profile information correctly to peel-off 3 iterations
+; Make sure we use the profile information correctly to peel-off 4 iterations
; from the loop, and update the branch weights for the peeled loop properly.
; CHECK: Loop Unroll: F[basic]
@@ -20,11 +20,11 @@
; CHECK-LABEL: @basic
; CHECK: br i1 %{{.*}}, label %[[NEXT0:.*]], label %for.cond.for.end_crit_edge, !prof !15
; CHECK: [[NEXT0]]:
-; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !16
+; CHECK: br i1 %{{.*}}, label %[[NEXT1:.*]], label %for.cond.for.end_crit_edge, !prof !15
; CHECK: [[NEXT1]]:
-; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !17
+; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !15
; CHECK: [[NEXT2]]:
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !17
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !15, !llvm.loop !16
define void @basic(ptr %p, i32 %k) #0 !prof !15 {
entry:
@@ -104,6 +104,7 @@ attributes #1 = { nounwind optsize }
!16 = !{!"branch_weights", i32 3001, i32 1001}
;CHECK: !15 = !{!"branch_weights", i32 3001, i32 1001}
-;CHECK: !16 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 1001, i32 1001}
+;CHECK: !16 = distinct !{!16, !17, !18, {{.*}}}
+;CHECK: !17 = !{!"llvm.loop.peeled.count", i32 4}
+;CHECK: !18 = !{!"llvm.loop.estimated_trip_count", i32 0}
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
index a190e94a01489..da13cdc7ef070 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
@@ -40,7 +40,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 12
; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK: middle.block:
-; CHECK-NEXT: br i1 true, label [[BB6:%.*]], label [[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-NEXT: br i1 true, label [[BB6:%.*]], label [[SCALAR_PH]], !prof [[PROF6:![0-9]+]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 87, [[MIDDLE_BLOCK]] ], [ 99, [[BB5:%.*]] ]
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
@@ -48,7 +48,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[AND:%.*]] = and i64 [[IV]], 1
; CHECK-NEXT: [[ICMP17:%.*]] = icmp eq i64 [[AND]], 0
-; CHECK-NEXT: br i1 [[ICMP17]], label [[BB18:%.*]], label [[LOOP_LATCH]], !prof [[PROF6:![0-9]+]]
+; CHECK-NEXT: br i1 [[ICMP17]], label [[BB18:%.*]], label [[LOOP_LATCH]], !prof [[PROF7:![0-9]+]]
; CHECK: bb18:
; CHECK-NEXT: [[OR:%.*]] = or disjoint i64 [[IV]], 1
; CHECK-NEXT: [[GETELEMENTPTR19:%.*]] = getelementptr inbounds i64, ptr [[ARR]], i64 [[OR]]
@@ -57,7 +57,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK: loop.latch:
; CHECK-NEXT: [[IV_NEXT]] = add nsw i64 [[IV]], -1
; CHECK-NEXT: [[ICMP22:%.*]] = icmp eq i64 [[IV_NEXT]], 90
-; CHECK-NEXT: br i1 [[ICMP22]], label [[BB6]], label [[LOOP_HEADER]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT: br i1 [[ICMP22]], label [[BB6]], label [[LOOP_HEADER]], !prof [[PROF8:![0-9]+]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: bb6:
; CHECK-NEXT: ret void
;
@@ -98,11 +98,13 @@ attributes #0 = {"target-cpu"="haswell" "target-features"="+avx2" }
;.
; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK: [[PROF1]] = !{!"branch_weights", i32 1, i32 23}
-; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
-; CHECK: [[PROF6]] = !{!"branch_weights", i32 1, i32 1}
-; CHECK: [[PROF7]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META4]], [[META3]]}
+; CHECK: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 24}
+; CHECK: [[PROF6]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK: [[PROF7]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK: [[PROF8]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META10:![0-9]+]], [[META4]], [[META3]]}
+; CHECK: [[META10]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
>From f821eeb26608d27e1e4909899018bb751d243e75 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Wed, 26 Mar 2025 14:20:24 -0400
Subject: [PATCH 02/27] Run update_test_checks.py on a test
---
llvm/test/Transforms/LoopUnroll/scev-invalidation-lcssa.ll | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/test/Transforms/LoopUnroll/scev-invalidation-lcssa.ll b/llvm/test/Transforms/LoopUnroll/scev-invalidation-lcssa.ll
index ec71c67d250b4..0a3d201e617de 100644
--- a/llvm/test/Transforms/LoopUnroll/scev-invalidation-lcssa.ll
+++ b/llvm/test/Transforms/LoopUnroll/scev-invalidation-lcssa.ll
@@ -3,7 +3,7 @@
define i32 @f(i1 %cond1) #0 !prof !0 {
; CHECK-LABEL: define i32 @f
-; CHECK-SAME: (i1 [[COND1:%.*]]) !prof [[PROF0:![0-9]+]] {
+; CHECK-SAME: (i1 [[COND1:%.*]]) {{.*}}{
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP1_PEEL_BEGIN:%.*]]
; CHECK: loop1.peel.begin:
@@ -19,7 +19,7 @@ define i32 @f(i1 %cond1) #0 !prof !0 {
; CHECK-NEXT: br label [[LOOP1:%.*]]
; CHECK: loop1:
; CHECK-NEXT: [[LD:%.*]] = load i64, ptr null, align 8
-; CHECK-NEXT: br i1 [[COND1]], label [[LOOP1]], label [[EXIT1_LOOPEXIT:%.*]], !prof [[PROF2:![0-9]+]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[COND1]], label [[LOOP1]], label [[EXIT1_LOOPEXIT:%.*]], !prof [[PROF1]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK: exit1.loopexit:
; CHECK-NEXT: [[LD_LCSSA_PH:%.*]] = phi i64 [ [[LD]], [[LOOP1]] ]
; CHECK-NEXT: br label [[EXIT1]]
>From af8ec5637b9ccb7a6337d8cd1f7ac1728b4fca1f Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Fri, 4 Apr 2025 12:11:18 -0400
Subject: [PATCH 03/27] Fix typo
---
llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll b/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
index c566b8ad6ea37..1339afe146f21 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
@@ -1,4 +1,4 @@
-; Test branch weights metadata, estimated trip count metadata, and block
+; Test branch weight metadata, estimated trip count metadata, and block
; frequencies after loop peeling.
; RUN: opt < %s -S -passes='print<block-freq>' 2>&1 | \
>From 630317790e43b9a65b9fa11701b282caebd6eaa9 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Wed, 9 Apr 2025 20:09:12 -0400
Subject: [PATCH 04/27] Document new metadata
---
llvm/docs/LangRef.rst | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 769003a90f959..103fb6740f028 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7866,6 +7866,17 @@ The attributes in this metadata is added to all followup loops of the
loop distribution pass. See
:ref:`Transformation Metadata <transformation-metadata>` for details.
+'``llvm.loop.estimated_trip_count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata records the loop's estimated trip count. If it is not present, a
+loop's estimated trip count should be computed from any ``branch_weights``
+metadata attached to the latch block's branch instruction.
+
+Thus, this metadata frees loop transformations to compute latch branch weights
+solely for the purpose of maintaining accurate block frequencies instead of
+requiring the branch weights to always serve both roles.
+
'``llvm.licm.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>From bbd0e95b231abbb2c74c3414d503f5026c1348f5 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 1 May 2025 13:33:01 -0400
Subject: [PATCH 05/27] Improve LangRef.rst entry
---
llvm/docs/LangRef.rst | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 103fb6740f028..4652255b382ef 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7869,13 +7869,31 @@ loop distribution pass. See
'``llvm.loop.estimated_trip_count``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-This metadata records the loop's estimated trip count. If it is not present, a
-loop's estimated trip count should be computed from any ``branch_weights``
-metadata attached to the latch block's branch instruction.
+This metadata records the loop's estimated trip count. The first
+operand is the string ``llvm.loop.estimated_trip_count`` and the
+second operand is an integer specifying the count. For example:
-Thus, this metadata frees loop transformations to compute latch branch weights
-solely for the purpose of maintaining accurate block frequencies instead of
-requiring the branch weights to always serve both roles.
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.estimated_trip_count", i32 8}
+
+A loop's estimated trip count is an estimate of the average number of
+loop iterations (specifically, the number of times the loop's header
+executes) each time execution reaches the loop. It is usually only an
+estimate based on, for example, profile data. The actual number of
+iterations might vary widely.
+
+The estimated trip count serves as a parameter for various loop
+transformations and typically helps estimate transformation cost. For
+example, it can help determine how many iterations to peel or how
+aggressively to unroll.
+
+If this metadata is not present, such passes compute the estimated
+trip count from any ``branch_weights`` metadata attached to the latch
+block's branch instruction. Thus, this metadata frees loop
+transformations to compute latch branch weights solely for the purpose
+of maintaining accurate block frequencies instead of requiring the
+branch weights to always serve both roles.
'``llvm.licm.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>From 37ce859f1dd0d7e1a6519bef4ee3cacb212062c7 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 16 Jun 2025 13:48:40 -0400
Subject: [PATCH 06/27] Update fixmes
Extending beyond the limitations of `getExpectedExitLoopLatchBranch`
is a possible improvement for the future not an urgent fixme.
No one has pointed out code that computes estimated trip counts
without using `llvm::getLoopEstimatedTripCount`.
---
llvm/lib/Transforms/Utils/LoopPeel.cpp | 2 +-
llvm/lib/Transforms/Utils/LoopUtils.cpp | 4 ----
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 48b18dad67576..5cb3fa1f5595e 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -1219,7 +1219,7 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
// while examining it in isolation without considering the probability of
// actually reaching it, we store the new trip count as separate metadata.
//
- // FIXME: getLoopEstimatedTripCount and setLoopEstimatedTripCount skip loops
+ // TODO: getLoopEstimatedTripCount and setLoopEstimatedTripCount skip loops
// that don't match the restrictions of getExpectedExitLoopLatchBranch in
// LoopUtils.cpp. For example,
// llvm/tests/Transforms/LoopUnroll/peel-branch-weights.ll (introduced by
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 9145a537e2248..fb016da99aa93 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -866,10 +866,6 @@ llvm::getLoopEstimatedTripCount(Loop *L,
getEstimatedTripCount(LatchBranch, L, ExitWeight)) {
if (EstimatedLoopInvocationWeight)
*EstimatedLoopInvocationWeight = ExitWeight;
- // FIXME: Where else are branch weights directly used for estimating loop
- // trip counts? They should also be updated to use
- // LLVMLoopEstimatedTripCount when present... or to just call this
- // function.
if (auto EstimatedTripCount =
getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount))
return EstimatedTripCount;
>From 51931581b82b4b7451f9f0bb61d327959ce44e3b Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Tue, 17 Jun 2025 10:58:00 -0400
Subject: [PATCH 07/27] Update test for AArch4, which I did not build before
---
.../LoopVectorize/AArch64/check-prof-info.ll | 54 ++++++++++---------
1 file changed, 30 insertions(+), 24 deletions(-)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
index 9435c544fc812..2aa507debeba2 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
@@ -20,11 +20,11 @@ define void @_Z3foov() {
; CHECK-V1-IC1: [[VECTOR_BODY]]:
; CHECK-V1-IC1: br i1 [[TMP10:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP1:![0-9]+]]
; CHECK-V1-IC1: [[MIDDLE_BLOCK]]:
-; CHECK-V1-IC1: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF4:![0-9]+]]
+; CHECK-V1-IC1: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
; CHECK-V1-IC1: [[SCALAR_PH]]:
; CHECK-V1-IC1: br label %[[FOR_BODY:.*]]
; CHECK-V1-IC1: [[FOR_BODY]]:
-; CHECK-V1-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF5:![0-9]+]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-V1-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK-V1-IC1: [[FOR_COND_CLEANUP]]:
;
; CHECK-V2-IC1-LABEL: define void @_Z3foov(
@@ -36,11 +36,11 @@ define void @_Z3foov() {
; CHECK-V2-IC1: [[VECTOR_BODY]]:
; CHECK-V2-IC1: br i1 [[TMP4:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK-V2-IC1: [[MIDDLE_BLOCK]]:
-; CHECK-V2-IC1: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC1: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF6:![0-9]+]]
; CHECK-V2-IC1: [[SCALAR_PH]]:
; CHECK-V2-IC1: br label %[[FOR_BODY:.*]]
; CHECK-V2-IC1: [[FOR_BODY]]:
-; CHECK-V2-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK-V2-IC1: [[FOR_COND_CLEANUP]]:
;
; CHECK-V2-IC4-LABEL: define void @_Z3foov(
@@ -54,19 +54,19 @@ define void @_Z3foov() {
; CHECK-V2-IC4: [[VECTOR_BODY]]:
; CHECK-V2-IC4: br i1 [[TMP12:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK-V2-IC4: [[MIDDLE_BLOCK]]:
-; CHECK-V2-IC4: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC4: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF6:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-V2-IC4: br i1 [[MIN_EPILOG_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF6:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[MIN_EPILOG_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF7:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_PH]]:
; CHECK-V2-IC4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; CHECK-V2-IC4: [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-V2-IC4: br i1 [[TMP23:%.*]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[TMP23:%.*]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
-; CHECK-V2-IC4: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF8:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF10:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_SCALAR_PH]]:
; CHECK-V2-IC4: br label %[[FOR_BODY:.*]]
; CHECK-V2-IC4: [[FOR_BODY]]:
-; CHECK-V2-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF11:![0-9]+]], !llvm.loop [[LOOP12:![0-9]+]]
; CHECK-V2-IC4: [[FOR_COND_CLEANUP]]:
;
entry:
@@ -89,31 +89,37 @@ for.cond.cleanup: ; preds = %for.body
!0 = !{!"branch_weights", i32 1, i32 1023}
;.
; CHECK-V1-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
-; CHECK-V1-IC1: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK-V1-IC1: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
; CHECK-V1-IC1: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK-V1-IC1: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK-V1-IC1: [[PROF4]] = !{!"branch_weights", i32 1, i32 3}
-; CHECK-V1-IC1: [[PROF5]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-V1-IC1: [[LOOP6]] = distinct !{[[LOOP6]], [[META3]], [[META2]]}
+; CHECK-V1-IC1: [[META4]] = !{!"llvm.loop.estimated_trip_count", i32 128}
+; CHECK-V1-IC1: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V1-IC1: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V1-IC1: [[LOOP7]] = distinct !{[[LOOP7]], [[META8:![0-9]+]], [[META3]], [[META2]]}
+; CHECK-V1-IC1: [[META8]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
; CHECK-V2-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK-V2-IC1: [[PROF1]] = !{!"branch_weights", i32 1, i32 255}
-; CHECK-V2-IC1: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC1: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK-V2-IC1: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK-V2-IC1: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK-V2-IC1: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
-; CHECK-V2-IC1: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-V2-IC1: [[LOOP7]] = distinct !{[[LOOP7]], [[META4]], [[META3]]}
+; CHECK-V2-IC1: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 256}
+; CHECK-V2-IC1: [[PROF6]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V2-IC1: [[PROF7]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC1: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META4]], [[META3]]}
+; CHECK-V2-IC1: [[META9]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
; CHECK-V2-IC4: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK-V2-IC4: [[PROF1]] = !{!"branch_weights", i32 1, i32 63}
-; CHECK-V2-IC4: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC4: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK-V2-IC4: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK-V2-IC4: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK-V2-IC4: [[PROF5]] = !{!"branch_weights", i32 1, i32 15}
-; CHECK-V2-IC4: [[PROF6]] = !{!"branch_weights", i32 2, i32 0}
-; CHECK-V2-IC4: [[LOOP7]] = distinct !{[[LOOP7]], [[META3]], [[META4]]}
-; CHECK-V2-IC4: [[PROF8]] = !{!"branch_weights", i32 1, i32 1}
-; CHECK-V2-IC4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-V2-IC4: [[LOOP10]] = distinct !{[[LOOP10]], [[META4]], [[META3]]}
+; CHECK-V2-IC4: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 64}
+; CHECK-V2-IC4: [[PROF6]] = !{!"branch_weights", i32 1, i32 15}
+; CHECK-V2-IC4: [[PROF7]] = !{!"branch_weights", i32 2, i32 0}
+; CHECK-V2-IC4: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META3]], [[META4]]}
+; CHECK-V2-IC4: [[META9]] = !{!"llvm.loop.estimated_trip_count", i32 0}
+; CHECK-V2-IC4: [[PROF10]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK-V2-IC4: [[PROF11]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC4: [[LOOP12]] = distinct !{[[LOOP12]], [[META9]], [[META4]], [[META3]]}
;.
>From b23f46747417f5267b877f5563acab78a4d518ea Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 10 Jul 2025 10:58:31 -0400
Subject: [PATCH 08/27] Run update script on test changed by merge from main
The update adds this PR's new metadata.
---
.../LoopVectorize/branch-weights.ll | 56 ++++++++++---------
1 file changed, 31 insertions(+), 25 deletions(-)
diff --git a/llvm/test/Transforms/LoopVectorize/branch-weights.ll b/llvm/test/Transforms/LoopVectorize/branch-weights.ll
index 6892709f085f7..08bc920cef0e0 100644
--- a/llvm/test/Transforms/LoopVectorize/branch-weights.ll
+++ b/llvm/test/Transforms/LoopVectorize/branch-weights.ll
@@ -34,23 +34,23 @@ define void @f0(i8 %n, i32 %len, ptr %p) !prof !0 {
; MAINVF4IC1_EPI4: br i1 [[TMP8]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF3:![0-9]+]], !llvm.loop [[LOOP4:![0-9]+]]
; MAINVF4IC1_EPI4: [[MIDDLE_BLOCK]]:
; MAINVF4IC1_EPI4: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
-; MAINVF4IC1_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF7:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF8:![0-9]+]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_ITER_CHECK]]:
; MAINVF4IC1_EPI4: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i32 [[N_VEC_REMAINING:%.*]], 4
-; MAINVF4IC1_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF8:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF9:![0-9]+]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_PH]]:
; MAINVF4IC1_EPI4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_VECTOR_BODY]]:
; MAINVF4IC1_EPI4: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT6:%.*]], [[N_VEC3:%.*]]
-; MAINVF4IC1_EPI4: br i1 [[TMP12]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[TMP12]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF10:![0-9]+]], !llvm.loop [[LOOP11:![0-9]+]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
; MAINVF4IC1_EPI4: [[CMP_N8:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC3]]
-; MAINVF4IC1_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF7]]
+; MAINVF4IC1_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF8]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_SCALAR_PH]]:
; MAINVF4IC1_EPI4: br label %[[LOOP:.*]]
; MAINVF4IC1_EPI4: [[LOOP]]:
; MAINVF4IC1_EPI4: [[CMP_LOOP:%.*]] = icmp ult i32 [[I32:%.*]], [[LEN]]
-; MAINVF4IC1_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF11:![0-9]+]], !llvm.loop [[LOOP12:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF13:![0-9]+]], !llvm.loop [[LOOP14:![0-9]+]]
; MAINVF4IC1_EPI4: [[EXIT_LOOPEXIT]]:
; MAINVF4IC1_EPI4: br label %[[EXIT]]
; MAINVF4IC1_EPI4: [[EXIT]]:
@@ -77,23 +77,23 @@ define void @f0(i8 %n, i32 %len, ptr %p) !prof !0 {
; MAINVF4IC2_EPI4: br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF3:![0-9]+]], !llvm.loop [[LOOP4:![0-9]+]]
; MAINVF4IC2_EPI4: [[MIDDLE_BLOCK]]:
; MAINVF4IC2_EPI4: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
-; MAINVF4IC2_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF7:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF8:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_ITER_CHECK]]:
; MAINVF4IC2_EPI4: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i32 [[N_VEC_REMAINING:%.*]], 4
-; MAINVF4IC2_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF8:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF9:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_PH]]:
; MAINVF4IC2_EPI4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_VECTOR_BODY]]:
; MAINVF4IC2_EPI4: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT6:%.*]], [[N_VEC3:%.*]]
-; MAINVF4IC2_EPI4: br i1 [[TMP13]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[TMP13]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF10:![0-9]+]], !llvm.loop [[LOOP11:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
; MAINVF4IC2_EPI4: [[CMP_N8:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC3]]
-; MAINVF4IC2_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF13:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_SCALAR_PH]]:
; MAINVF4IC2_EPI4: br label %[[LOOP:.*]]
; MAINVF4IC2_EPI4: [[LOOP]]:
; MAINVF4IC2_EPI4: [[CMP_LOOP:%.*]] = icmp ult i32 [[I32:%.*]], [[LEN]]
-; MAINVF4IC2_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF14:![0-9]+]], !llvm.loop [[LOOP15:![0-9]+]]
; MAINVF4IC2_EPI4: [[EXIT_LOOPEXIT]]:
; MAINVF4IC2_EPI4: br label %[[EXIT]]
; MAINVF4IC2_EPI4: [[EXIT]]:
@@ -127,28 +127,34 @@ exit:
; MAINVF4IC1_EPI4: [[PROF1]] = !{!"branch_weights", i32 12, i32 1}
; MAINVF4IC1_EPI4: [[PROF2]] = !{!"branch_weights", i32 1, i32 127}
; MAINVF4IC1_EPI4: [[PROF3]] = !{!"branch_weights", i32 1, i32 307}
-; MAINVF4IC1_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]]}
+; MAINVF4IC1_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
; MAINVF4IC1_EPI4: [[META5]] = !{!"llvm.loop.isvectorized", i32 1}
; MAINVF4IC1_EPI4: [[META6]] = !{!"llvm.loop.unroll.runtime.disable"}
-; MAINVF4IC1_EPI4: [[PROF7]] = !{!"branch_weights", i32 1, i32 3}
-; MAINVF4IC1_EPI4: [[PROF8]] = !{!"branch_weights", i32 4, i32 0}
-; MAINVF4IC1_EPI4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
-; MAINVF4IC1_EPI4: [[LOOP10]] = distinct !{[[LOOP10]], [[META5]], [[META6]]}
-; MAINVF4IC1_EPI4: [[PROF11]] = !{!"branch_weights", i32 2, i32 1}
-; MAINVF4IC1_EPI4: [[LOOP12]] = distinct !{[[LOOP12]], [[META5]]}
+; MAINVF4IC1_EPI4: [[META7]] = !{!"llvm.loop.estimated_trip_count", i32 308}
+; MAINVF4IC1_EPI4: [[PROF8]] = !{!"branch_weights", i32 1, i32 3}
+; MAINVF4IC1_EPI4: [[PROF9]] = !{!"branch_weights", i32 4, i32 0}
+; MAINVF4IC1_EPI4: [[PROF10]] = !{!"branch_weights", i32 0, i32 0}
+; MAINVF4IC1_EPI4: [[LOOP11]] = distinct !{[[LOOP11]], [[META5]], [[META6]], [[META12:![0-9]+]]}
+; MAINVF4IC1_EPI4: [[META12]] = !{!"llvm.loop.estimated_trip_count", i32 0}
+; MAINVF4IC1_EPI4: [[PROF13]] = !{!"branch_weights", i32 2, i32 1}
+; MAINVF4IC1_EPI4: [[LOOP14]] = distinct !{[[LOOP14]], [[META15:![0-9]+]], [[META5]]}
+; MAINVF4IC1_EPI4: [[META15]] = !{!"llvm.loop.estimated_trip_count", i32 3}
;.
; MAINVF4IC2_EPI4: [[PROF0]] = !{!"function_entry_count", i64 13}
; MAINVF4IC2_EPI4: [[PROF1]] = !{!"branch_weights", i32 12, i32 1}
; MAINVF4IC2_EPI4: [[PROF2]] = !{!"branch_weights", i32 1, i32 127}
; MAINVF4IC2_EPI4: [[PROF3]] = !{!"branch_weights", i32 1, i32 153}
-; MAINVF4IC2_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]]}
+; MAINVF4IC2_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
; MAINVF4IC2_EPI4: [[META5]] = !{!"llvm.loop.isvectorized", i32 1}
; MAINVF4IC2_EPI4: [[META6]] = !{!"llvm.loop.unroll.runtime.disable"}
-; MAINVF4IC2_EPI4: [[PROF7]] = !{!"branch_weights", i32 1, i32 7}
-; MAINVF4IC2_EPI4: [[PROF8]] = !{!"branch_weights", i32 4, i32 0}
-; MAINVF4IC2_EPI4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
-; MAINVF4IC2_EPI4: [[LOOP10]] = distinct !{[[LOOP10]], [[META5]], [[META6]]}
-; MAINVF4IC2_EPI4: [[PROF11]] = !{!"branch_weights", i32 1, i32 3}
-; MAINVF4IC2_EPI4: [[PROF12]] = !{!"branch_weights", i32 2, i32 1}
-; MAINVF4IC2_EPI4: [[LOOP13]] = distinct !{[[LOOP13]], [[META5]]}
+; MAINVF4IC2_EPI4: [[META7]] = !{!"llvm.loop.estimated_trip_count", i32 154}
+; MAINVF4IC2_EPI4: [[PROF8]] = !{!"branch_weights", i32 1, i32 7}
+; MAINVF4IC2_EPI4: [[PROF9]] = !{!"branch_weights", i32 4, i32 0}
+; MAINVF4IC2_EPI4: [[PROF10]] = !{!"branch_weights", i32 0, i32 0}
+; MAINVF4IC2_EPI4: [[LOOP11]] = distinct !{[[LOOP11]], [[META5]], [[META6]], [[META12:![0-9]+]]}
+; MAINVF4IC2_EPI4: [[META12]] = !{!"llvm.loop.estimated_trip_count", i32 0}
+; MAINVF4IC2_EPI4: [[PROF13]] = !{!"branch_weights", i32 1, i32 3}
+; MAINVF4IC2_EPI4: [[PROF14]] = !{!"branch_weights", i32 2, i32 1}
+; MAINVF4IC2_EPI4: [[LOOP15]] = distinct !{[[LOOP15]], [[META16:![0-9]+]], [[META5]]}
+; MAINVF4IC2_EPI4: [[META16]] = !{!"llvm.loop.estimated_trip_count", i32 3}
;.
>From 13d1fbbbac1ea171f99175ab7278454f462d1710 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 14 Jul 2025 20:42:08 -0400
Subject: [PATCH 09/27] [PGO] Add `llvm.loop.estimated_trip_count` metadata
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights`
metadata. As [suggested in the PR#128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a `PGOEstimateTripCountsPass` pass, which creates the
new metadata for the loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip
count but later passes can transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the
metadata, but eventually that should be fixed. Until then, if the new
metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it
and tries again to estimate the trip count from the loop's
`branch_weights` metadata.
---
llvm/docs/LangRef.rst | 61 +++++
llvm/include/llvm/Analysis/LoopInfo.h | 10 +-
.../Instrumentation/PGOEstimateTripCounts.h | 24 ++
.../include/llvm/Transforms/Utils/LoopUtils.h | 85 +++++--
llvm/lib/Analysis/LoopInfo.cpp | 10 +-
llvm/lib/Passes/PassBuilder.cpp | 1 +
llvm/lib/Passes/PassBuilderPipelines.cpp | 10 +-
llvm/lib/Passes/PassRegistry.def | 1 +
.../Transforms/Instrumentation/CMakeLists.txt | 1 +
.../Instrumentation/PGOEstimateTripCounts.cpp | 45 ++++
llvm/lib/Transforms/Utils/LoopUtils.cpp | 186 ++++++++++----
...-pm-thinlto-postlink-samplepgo-defaults.ll | 31 ++-
.../new-pm-thinlto-prelink-pgo-defaults.ll | 6 +
...w-pm-thinlto-prelink-samplepgo-defaults.ll | 26 +-
.../LoopVectorize/AArch64/check-prof-info.ll | 54 ++--
.../Transforms/LoopVectorize/X86/pr81872.ll | 14 +-
.../LoopVectorize/branch-weights.ll | 56 ++--
.../PGOProfile/pgo-estimate-trip-counts.ll | 240 ++++++++++++++++++
18 files changed, 724 insertions(+), 137 deletions(-)
create mode 100644 llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
create mode 100644 llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
create mode 100644 llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8ea850af7a69b..5c5c6ec96cfc7 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7933,6 +7933,67 @@ The attributes in this metadata is added to all followup loops of the
loop distribution pass. See
:ref:`Transformation Metadata <transformation-metadata>` for details.
+'``llvm.loop.estimated_trip_count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata records an estimated trip count for the loop. The first operand
+is the string ``llvm.loop.estimated_trip_count``. The second operand is an
+integer specifying the count, which might be omitted for the reasons described
+below. For example:
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.estimated_trip_count", i32 8}
+ !1 = !{!"llvm.loop.estimated_trip_count"}
+
+Purpose
+"""""""
+
+A loop's estimated trip count is an estimate of the average number of loop
+iterations (specifically, the number of times the loop's header executes) each
+time execution reaches the loop. It is usually only an estimate based on, for
+example, profile data. The actual number of iterations might vary widely.
+
+The estimated trip count serves as a parameter for various loop transformations
+and typically helps estimate transformation cost. For example, it can help
+determine how many iterations to peel or how aggressively to unroll.
+
+Initialization and Maintenance
+""""""""""""""""""""""""""""""
+
+The ``pgo-estimate-trip-counts`` pass typically runs immediately after profile
+ingestion to add this metadata to all loops. It estimates each loop's trip
+count from the loop's ``branch_weights`` metadata. This way of initially
+estimating trip counts appears to be useful for the passes that consume them.
+
+As passes transform existing loops and create new loops, they must be free to
+update and create ``branch_weights`` metadata to maintain accurate block
+frequencies. Trip counts estimated from this new ``branch_weights`` metadata
+are not necessarily useful to the passes that consume them. In general, when
+passes transform and create loops, they should separately estimate new trip
+counts from previously estimated trip counts, and they should record them by
+creating or updating this metadata. For this or any other work involving
+estimated trip counts, passes should always call
+``llvm::getLoopEstimatedTripCount`` and ``llvm::setLoopEstimatedTripCount``.
+
+Missing Metadata and Values
+"""""""""""""""""""""""""""
+
+If the current implementation of ``pgo-estimate-trip-counts`` cannot estimate a
+trip count from the loop's ``branch_weights`` metadata due to the loop's form or
+due to missing profile data, it creates this metadata for the loop but omits the
+value. This situation is currently common (e.g., the LLVM IR loop that Clang
+emits for a simple C ``for`` loop). A later pass (e.g., ``loop-rotate``) might
+modify the loop's form in a way that enables estimating its trip count even if
+those modifications provably never impact the actual number of loop iterations.
+That later pass should then add an appropriate value to the metadata.
+
+However, not all such passes currently do so. Thus, if this metadata has no
+value, ``llvm::getLoopEstimatedTripCount`` will disregard it and estimate the
+trip count from the loop's ``branch_weights`` metadata. It does the same when
+the metadata is missing altogether, perhaps because ``pgo-estimate-trip-counts``
+was not specified in a minimal pass list to a tool like ``opt``.
+
'``llvm.licm.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Analysis/LoopInfo.h b/llvm/include/llvm/Analysis/LoopInfo.h
index a7a6a2753709c..a06be573b5e01 100644
--- a/llvm/include/llvm/Analysis/LoopInfo.h
+++ b/llvm/include/llvm/Analysis/LoopInfo.h
@@ -637,9 +637,13 @@ LLVM_ABI std::optional<bool> getOptionalBoolLoopAttribute(const Loop *TheLoop,
/// Returns true if Name is applied to TheLoop and enabled.
LLVM_ABI bool getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name);
-/// Find named metadata for a loop with an integer value.
-LLVM_ABI std::optional<int> getOptionalIntLoopAttribute(const Loop *TheLoop,
- StringRef Name);
+/// Find named metadata for a loop with an integer value. Return
+/// \c std::nullopt if the metadata has no value or is missing altogether. If
+/// \p Missing, set \c *Missing to indicate whether the metadata is missing
+/// altogether.
+LLVM_ABI std::optional<int>
+getOptionalIntLoopAttribute(const Loop *TheLoop, StringRef Name,
+ bool *Missing = nullptr);
/// Find named metadata for a loop with an integer value. Return \p Default if
/// not set.
diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
new file mode 100644
index 0000000000000..1b35c1c77e5c3
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
@@ -0,0 +1,24 @@
+//===- PGOEstimateTripCounts.h ----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
+#define LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct PGOEstimateTripCountsPass
+ : public PassInfoMixin<PGOEstimateTripCountsPass> {
+ PGOEstimateTripCountsPass() {}
+ PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index e4d2f9d191707..7d03fb0d81e4c 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -316,28 +316,73 @@ LLVM_ABI TransformationMode hasDistributeTransformation(const Loop *L);
LLVM_ABI TransformationMode hasLICMVersioningTransformation(const Loop *L);
/// @}
-/// Set input string into loop metadata by keeping other values intact.
-/// If the string is already in loop metadata update value if it is
-/// different.
-LLVM_ABI void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
- unsigned V = 0);
-
-/// Returns a loop's estimated trip count based on branch weight metadata.
-/// In addition if \p EstimatedLoopInvocationWeight is not null it is
-/// initialized with weight of loop's latch leading to the exit.
-/// Returns a valid positive trip count, saturated at UINT_MAX, or std::nullopt
-/// when a meaningful estimate cannot be made.
+/// Set the string \p MDString into the loop metadata of \p TheLoop while
+/// keeping other loop metadata intact. Set \p *V as its value, or set it
+/// without a value if \p V is \c std::nullopt to indicate the value is unknown.
+/// If \p MDString is already in the loop metadata, update it if its value (or
+/// lack of value) is different. Return true if metadata was changed.
+LLVM_ABI bool addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
+ std::optional<unsigned> V = 0);
+
+/// Return either:
+/// - The value of \c llvm.loop.estimated_trip_count from the loop metadata of
+/// \p L, if that metadata is present and has a value.
+/// - Else, a new estimate of the trip count from the latch branch weights of
+/// \p L, if the estimation's implementation is able to handle the loop form
+/// of \p L (e.g., \p L must have a latch block that controls the loop exit).
+/// - Else, \c std::nullopt.
+///
+/// An estimated trip count is always a valid positive trip count, saturated at
+/// \c UINT_MAX.
+///
+/// Via \c LLVM_DEBUG, emit diagnostics that include "WARNING" when the metadata
+/// is in an unexpected state as that indicates some transformation has
+/// corrupted it. If \p DbgForInit, expect the metadata to be missing.
+/// Otherwise, expect the metadata to be present, and expect it to have no value
+/// only if the trip count is currently inestimable from the latch branch
+/// weights.
+///
+/// In addition, if \p EstimatedLoopInvocationWeight, then either:
+/// - Set \p *EstimatedLoopInvocationWeight to the weight of the latch's branch
+/// to the loop exit.
+/// - Do not set it and return \c std::nullopt if the current implementation
+/// cannot compute that weight (e.g., if \p L does not have a latch block that
+/// controls the loop exit) or the weight is zero (because zero cannot be
+/// used to compute new branch weights that reflect the estimated trip count).
+///
+/// TODO: Eventually, once all passes have migrated away from setting branch
+/// weights to indicate estimated trip counts, this function will drop the
+/// \p EstimatedLoopInvocationWeight parameter.
LLVM_ABI std::optional<unsigned>
getLoopEstimatedTripCount(Loop *L,
- unsigned *EstimatedLoopInvocationWeight = nullptr);
-
-/// Set a loop's branch weight metadata to reflect that loop has \p
-/// EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight exits
-/// through latch. Returns true if metadata is successfully updated, false
-/// otherwise. Note that loop must have a latch block which controls loop exit
-/// in order to succeed.
-LLVM_ABI bool setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
- unsigned EstimatedLoopInvocationWeight);
+ unsigned *EstimatedLoopInvocationWeight = nullptr,
+ bool DbgForInit = false);
+
+/// Set \c llvm.loop.estimated_trip_count with the value \c *EstimatedTripCount
+/// in the loop metadata of \p L, or set it without a value if
+/// \c !EstimatedTripCount to indicate that \c getLoopEstimatedTripCount cannot
+/// estimate the trip count from latch branch weights. If
+/// \c !EstimatedTripCount but \c getLoopEstimatedTripCount can estimate the
+/// trip counts, future calls to \c getLoopEstimatedTripCount will diagnose the
+/// metadata as corrupt.
+///
+/// In addition, if \p EstimatedLoopInvocationWeight, set the branch weight
+/// metadata of \p L to reflect that \p L has an estimated
+/// \c *EstimatedTripCount iterations and has \c *EstimatedLoopInvocationWeight
+/// exit weight through the loop's latch.
+///
+/// Return false if \c llvm.loop.estimated_trip_count was already set according
+/// to \p EstimatedTripCount and so was not updated. Return false if
+/// \p EstimatedLoopInvocationWeight and if branch weight metadata could not be
+/// successfully updated (e.g., if \p L does not have a latch block that
+/// controls the loop exit). Otherwise, return true.
+///
+/// TODO: Eventually, once all passes have migrated away from setting branch
+/// weights to indicate estimated trip counts, this function will drop the
+/// \p EstimatedLoopInvocationWeight parameter.
+LLVM_ABI bool setLoopEstimatedTripCount(
+ Loop *L, std::optional<unsigned> EstimatedTripCount,
+ std::optional<unsigned> EstimatedLoopInvocationWeight = std::nullopt);
/// Check inner loop (L) backedge count is known to be invariant on all
/// iterations of its outer loop. If the loop has no parent, this is trivially
diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp
index 901cfe03ecd33..ba2c30b3c4764 100644
--- a/llvm/lib/Analysis/LoopInfo.cpp
+++ b/llvm/lib/Analysis/LoopInfo.cpp
@@ -1112,9 +1112,13 @@ bool llvm::getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name) {
}
std::optional<int> llvm::getOptionalIntLoopAttribute(const Loop *TheLoop,
- StringRef Name) {
- const MDOperand *AttrMD =
- findStringMetadataForLoop(TheLoop, Name).value_or(nullptr);
+ StringRef Name,
+ bool *Missing) {
+ std::optional<const MDOperand *> AttrMDOpt =
+ findStringMetadataForLoop(TheLoop, Name);
+ if (Missing)
+ *Missing = !AttrMDOpt;
+ const MDOperand *AttrMD = AttrMDOpt.value_or(nullptr);
if (!AttrMD)
return std::nullopt;
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 572e5f19a1972..f593c5bba7573 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -248,6 +248,7 @@
#include "llvm/Transforms/Instrumentation/NumericalStabilitySanitizer.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
+#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
#include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
#include "llvm/Transforms/Instrumentation/RealtimeSanitizer.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 98821bb1408a7..fc0d88e710426 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -80,6 +80,7 @@
#include "llvm/Transforms/Instrumentation/MemProfUse.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
+#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
#include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
#include "llvm/Transforms/Scalar/ADCE.h"
@@ -1268,8 +1269,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(MemProfUsePass(PGOOpt->MemoryProfile, PGOOpt->FS));
if (PGOOpt && (PGOOpt->Action == PGOOptions::IRUse ||
- PGOOpt->Action == PGOOptions::SampleUse))
+ PGOOpt->Action == PGOOptions::SampleUse)) {
MPM.addPass(PGOForceFunctionAttrsPass(PGOOpt->ColdOptType));
+ // TODO: Is this the right place for this pass? Should we enable it in any
+ // other case, such as when __builtin_expect_with_probability or
+ // __builtin_expect appears in the source code but profiles are not read?
+ MPM.addPass(PGOEstimateTripCountsPass());
+ }
MPM.addPass(AlwaysInlinerPass(/*InsertLifetimeIntrinsics=*/true));
@@ -2355,4 +2361,4 @@ AAManager PassBuilder::buildDefaultAAPipeline() {
bool PassBuilder::isInstrumentedPGOUse() const {
return (PGOOpt && PGOOpt->Action == PGOOptions::IRUse) ||
!UseCtxProfile.empty();
-}
\ No newline at end of file
+}
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 96250772da4a0..6e7ac959f57f4 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -124,6 +124,7 @@ MODULE_PASS("openmp-opt", OpenMPOptPass())
MODULE_PASS("openmp-opt-postlink",
OpenMPOptPass(ThinOrFullLTOPhase::FullLTOPostLink))
MODULE_PASS("partial-inliner", PartialInlinerPass())
+MODULE_PASS("pgo-estimate-trip-counts", PGOEstimateTripCountsPass())
MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())
MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())
MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())
diff --git a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
index 15fd421a41b0f..0a97ed4b51e69 100644
--- a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
+++ b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
@@ -16,6 +16,7 @@ add_llvm_component_library(LLVMInstrumentation
LowerAllowCheckPass.cpp
PGOCtxProfFlattening.cpp
PGOCtxProfLowering.cpp
+ PGOEstimateTripCounts.cpp
PGOForceFunctionAttrs.cpp
PGOInstrumentation.cpp
PGOMemOPSizeOpt.cpp
diff --git a/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
new file mode 100644
index 0000000000000..762aca0b897ce
--- /dev/null
+++ b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
@@ -0,0 +1,45 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Transforms/Utils/LoopUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "pgo-estimate-trip-counts"
+
+static bool runOnLoop(Loop *L) {
+ bool MadeChange = false;
+ std::optional<unsigned> TC = getLoopEstimatedTripCount(
+ L, /*EstimatedLoopInvocationWeight=*/nullptr, /*DbgForInit=*/true);
+ MadeChange |= setLoopEstimatedTripCount(L, TC);
+ for (Loop *SL : *L)
+ MadeChange |= runOnLoop(SL);
+ return MadeChange;
+}
+
+PreservedAnalyses PGOEstimateTripCountsPass::run(Module &M,
+ ModuleAnalysisManager &AM) {
+ FunctionAnalysisManager &FAM =
+ AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
+ bool MadeChange = false;
+ LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": start\n");
+ for (Function &F : M) {
+ if (F.isDeclaration())
+ continue;
+ LoopInfo *LI = &FAM.getResult<LoopAnalysis>(F);
+ if (!LI)
+ continue;
+ for (Loop *L : *LI)
+ MadeChange |= runOnLoop(L);
+ }
+ LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": end\n");
+ return MadeChange ? PreservedAnalyses::none() : PreservedAnalyses::all();
+}
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 200d1fb854155..50530590cf368 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -54,6 +54,8 @@ using namespace llvm::PatternMatch;
static const char *LLVMLoopDisableNonforced = "llvm.loop.disable_nonforced";
static const char *LLVMLoopDisableLICM = "llvm.licm.disable";
+static const char *LLVMLoopEstimatedTripCount =
+ "llvm.loop.estimated_trip_count";
bool llvm::formDedicatedExitBlocks(Loop *L, DominatorTree *DT, LoopInfo *LI,
MemorySSAUpdater *MSSAU,
@@ -201,34 +203,40 @@ void llvm::initializeLoopPassPass(PassRegistry &Registry) {
}
/// Create MDNode for input string.
-static MDNode *createStringMetadata(Loop *TheLoop, StringRef Name, unsigned V) {
+static MDNode *createStringMetadata(Loop *TheLoop, StringRef Name,
+ std::optional<unsigned> V) {
LLVMContext &Context = TheLoop->getHeader()->getContext();
- Metadata *MDs[] = {
- MDString::get(Context, Name),
- ConstantAsMetadata::get(ConstantInt::get(Type::getInt32Ty(Context), V))};
- return MDNode::get(Context, MDs);
+ if (V) {
+ Metadata *MDs[] = {MDString::get(Context, Name),
+ ConstantAsMetadata::get(
+ ConstantInt::get(Type::getInt32Ty(Context), *V))};
+ return MDNode::get(Context, MDs);
+ }
+ return MDNode::get(Context, {MDString::get(Context, Name)});
}
-/// Set input string into loop metadata by keeping other values intact.
-/// If the string is already in loop metadata update value if it is
-/// different.
-void llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
- unsigned V) {
+bool llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
+ std::optional<unsigned> V) {
SmallVector<Metadata *, 4> MDs(1);
// If the loop already has metadata, retain it.
MDNode *LoopID = TheLoop->getLoopID();
if (LoopID) {
for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
MDNode *Node = cast<MDNode>(LoopID->getOperand(i));
- // If it is of form key = value, try to parse it.
- if (Node->getNumOperands() == 2) {
+ // If it is of form key [= value], try to parse it.
+ unsigned NumOps = Node->getNumOperands();
+ if (NumOps == 1 || NumOps == 2) {
MDString *S = dyn_cast<MDString>(Node->getOperand(0));
if (S && S->getString() == StringMD) {
- ConstantInt *IntMD =
- mdconst::extract_or_null<ConstantInt>(Node->getOperand(1));
- if (IntMD && IntMD->getSExtValue() == V)
- // It is already in place. Do nothing.
- return;
+ // If it is already in place, do nothing.
+ if (NumOps == 2 && V) {
+ ConstantInt *IntMD =
+ mdconst::extract_or_null<ConstantInt>(Node->getOperand(1));
+ if (IntMD && IntMD->getSExtValue() == *V)
+ return false;
+ } else if (NumOps == 1 && !V) {
+ return false;
+ }
// We need to update the value, so just skip it here and it will
// be added after copying other existed nodes.
continue;
@@ -245,6 +253,7 @@ void llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);
TheLoop->setLoopID(NewLoopID);
+ return true;
}
std::optional<ElementCount>
@@ -804,26 +813,48 @@ static BranchInst *getExpectedExitLoopLatchBranch(Loop *L) {
return LatchBR;
}
-/// Return the estimated trip count for any exiting branch which dominates
-/// the loop latch.
-static std::optional<unsigned> getEstimatedTripCount(BranchInst *ExitingBranch,
- Loop *L,
- uint64_t &OrigExitWeight) {
+struct DbgLoop {
+ const Loop *L;
+ explicit DbgLoop(const Loop *L) : L(L) {}
+};
+static inline raw_ostream &operator<<(raw_ostream &OS, DbgLoop D) {
+ OS << "function ";
+ D.L->getHeader()->getParent()->printAsOperand(OS, /*PrintType=*/false);
+ return OS << " " << *D.L;
+}
+
+static std::optional<unsigned> estimateLoopTripCount(Loop *L) {
+ // Currently we take the estimate exit count only from the loop latch,
+ // ignoring other exiting blocks. This can overestimate the trip count
+ // if we exit through another exit, but can never underestimate it.
+ // TODO: incorporate information from other exits
+ BranchInst *ExitingBranch = getExpectedExitLoopLatchBranch(L);
+ if (!ExitingBranch) {
+ LLVM_DEBUG(dbgs() << "estimateLoopTripCount: Failed to find exiting "
+ << "latch branch of required form in " << DbgLoop(L)
+ << "\n");
+ return std::nullopt;
+ }
+
// To estimate the number of times the loop body was executed, we want to
// know the number of times the backedge was taken, vs. the number of times
// we exited the loop.
uint64_t LoopWeight, ExitWeight;
- if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight))
+ if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight)) {
+ LLVM_DEBUG(dbgs() << "estimateLoopTripCount: Failed to extract branch "
+ << "weights for " << DbgLoop(L) << "\n");
return std::nullopt;
+ }
if (L->contains(ExitingBranch->getSuccessor(1)))
std::swap(LoopWeight, ExitWeight);
- if (!ExitWeight)
+ if (!ExitWeight) {
// Don't have a way to return predicated infinite
+ LLVM_DEBUG(dbgs() << "estimateLoopTripCount: Failed because of zero exit "
+ << "probability for " << DbgLoop(L) << "\n");
return std::nullopt;
-
- OrigExitWeight = ExitWeight;
+ }
// Estimated exit count is a ratio of the loop weight by the weight of the
// edge exiting the loop, rounded to nearest.
@@ -834,33 +865,86 @@ static std::optional<unsigned> getEstimatedTripCount(BranchInst *ExitingBranch,
return std::numeric_limits<unsigned>::max();
// Estimated trip count is one plus estimated exit count.
- return ExitCount + 1;
+ uint64_t TC = ExitCount + 1;
+ LLVM_DEBUG(dbgs() << "estimateLoopTripCount: estimated trip count of " << TC
+ << " for " << DbgLoop(L) << "\n");
+ return TC;
}
-std::optional<unsigned>
-llvm::getLoopEstimatedTripCount(Loop *L,
- unsigned *EstimatedLoopInvocationWeight) {
- // Currently we take the estimate exit count only from the loop latch,
- // ignoring other exiting blocks. This can overestimate the trip count
- // if we exit through another exit, but can never underestimate it.
- // TODO: incorporate information from other exits
- if (BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L)) {
- uint64_t ExitWeight;
- if (std::optional<uint64_t> EstTripCount =
- getEstimatedTripCount(LatchBranch, L, ExitWeight)) {
- if (EstimatedLoopInvocationWeight)
- *EstimatedLoopInvocationWeight = ExitWeight;
- return *EstTripCount;
+std::optional<unsigned> llvm::getLoopEstimatedTripCount(
+ Loop *L, unsigned *EstimatedLoopInvocationWeight, bool DbgForInit) {
+ // If requested, either compute *EstimatedLoopInvocationWeight or return
+ // nullopt if cannot.
+ //
+ // TODO: Eventually, once all passes have migrated away from setting branch
+ // weights to indicate estimated trip counts, this function will drop the
+ // EstimatedLoopInvocationWeight parameter.
+ if (EstimatedLoopInvocationWeight) {
+ if (BranchInst *ExitingBranch = getExpectedExitLoopLatchBranch(L)) {
+ uint64_t LoopWeight, ExitWeight;
+ if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight))
+ return std::nullopt;
+ if (L->contains(ExitingBranch->getSuccessor(1)))
+ std::swap(LoopWeight, ExitWeight);
+ if (!ExitWeight)
+ return std::nullopt;
+ *EstimatedLoopInvocationWeight = ExitWeight;
}
}
- return std::nullopt;
+
+ // Return the estimated trip count from metadata unless the metadata is
+ // missing or has no value.
+ bool Missing;
+ if (auto TC = getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount,
+ &Missing)) {
+ LLVM_DEBUG(dbgs() << "getLoopEstimatedTripCount: "
+ << LLVMLoopEstimatedTripCount << " metadata has trip "
+ << "count of " << *TC << " for " << DbgLoop(L) << "\n");
+ return TC;
+ }
+
+ // Estimate the trip count from latch branch weights.
+ std::optional<unsigned> TC = estimateLoopTripCount(L);
+ if (DbgForInit) {
+ // We expect no existing metadata as we are responsible for creating it.
+ LLVM_DEBUG(dbgs() << (Missing ? "" : "WARNING: ")
+ << "getLoopEstimatedTripCount: "
+ << LLVMLoopEstimatedTripCount << " metadata "
+ << (Missing ? "" : "not ") << "missing as expected "
+ << "during its init for " << DbgLoop(L) << "\n");
+ } else if (Missing) {
+ // We expect that metadata was already created.
+ LLVM_DEBUG(dbgs() << "WARNING: getLoopEstimatedTripCount: "
+ << LLVMLoopEstimatedTripCount << " metadata missing for "
+ << DbgLoop(L) << "\n");
+ } else {
+ // If the trip count is estimable, the value should have been added already.
+ LLVM_DEBUG(dbgs() << (TC ? "WARNING: " : "")
+ << "getLoopEstimatedTripCount: "
+ << LLVMLoopEstimatedTripCount << " metadata "
+ << (TC ? "incorrectly " : "correctly ")
+ << "indicates trip count is inestimable for "
+ << DbgLoop(L) << "\n");
+ }
+ return TC;
}
-bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
- unsigned EstimatedloopInvocationWeight) {
- // At the moment, we currently support changing the estimate trip count of
- // the latch branch only. We could extend this API to manipulate estimated
- // trip counts for any exit.
+bool llvm::setLoopEstimatedTripCount(
+ Loop *L, std::optional<unsigned> EstimatedTripCount,
+ std::optional<unsigned> EstimatedloopInvocationWeight) {
+ // Set the metadata.
+ bool Updated = addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount,
+ EstimatedTripCount);
+ if (!EstimatedTripCount || !EstimatedloopInvocationWeight)
+ return Updated;
+
+ // At the moment, we currently support changing the estimated trip count in
+ // the latch branch's branch weights only. We could extend this API to
+ // manipulate estimated trip counts for any exit.
+ //
+ // TODO: Eventually, once all passes have migrated away from setting branch
+ // weights to indicate estimated trip counts, we will not set branch weights
+ // here at all.
BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
if (!LatchBranch)
return false;
@@ -869,9 +953,9 @@ bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;
- if (EstimatedTripCount > 0) {
- LatchExitWeight = EstimatedloopInvocationWeight;
- BackedgeTakenWeight = (EstimatedTripCount - 1) * LatchExitWeight;
+ if (*EstimatedTripCount > 0) {
+ LatchExitWeight = *EstimatedloopInvocationWeight;
+ BackedgeTakenWeight = (*EstimatedTripCount - 1) * LatchExitWeight;
}
// Make a swap if back edge is taken when condition is "false".
@@ -885,7 +969,7 @@ bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
LLVMContext::MD_prof,
MDB.createBranchWeights(BackedgeTakenWeight, LatchExitWeight));
- return true;
+ return Updated;
}
bool llvm::hasIterationCountInvariantInParent(Loop *InnerLoop,
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index 38b7890682783..516f1bf972429 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -59,38 +59,64 @@
; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
-
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Invalidating analysis: LazyCallGraphAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
+; CHECK-O-NEXT: Running analysis: AssumptionAnalysis on foo
+; CHECK-O-NEXT: Running analysis: TargetIRAnalysis on foo
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
+; CHECK-O23SZ-NEXT: Running analysis: BasicAA on foo
+; CHECK-O23SZ-NEXT: Running analysis: ScopedNoAliasAA on foo
+; CHECK-O23SZ-NEXT: Running analysis: TypeBasedAA on foo
+; CHECK-O23SZ-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
+; CHECK-O1-NEXT: Running analysis: BasicAA on foo
+; CHECK-O1-NEXT: Running analysis: ScopedNoAliasAA on foo
+; CHECK-O1-NEXT: Running analysis: TypeBasedAA on foo
+; CHECK-O1-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O23SZ-NEXT: Running analysis: LastRunTrackingAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
+; CHECK-O1-NEXT: Running analysis: LastRunTrackingAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: BlockFrequencyAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -155,6 +181,7 @@
; CHECK-O-NEXT: Running pass: SimplifyTypeTestsPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on bar
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index f6a9406596803..e26e1eafbfb65 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -86,7 +86,12 @@
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
@@ -101,6 +106,7 @@
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 48a9433d24999..0f8e1c770a4dc 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -66,28 +66,41 @@
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Invalidating analysis: LazyCallGraphAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis on [module]
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
+; CHECK-O-NEXT: Running analysis: AssumptionAnalysis on foo
+; CHECK-O-NEXT: Running analysis: TargetIRAnalysis on foo
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
+; CHECK-O-NEXT: Running analysis: BasicAA on foo
+; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA on foo
+; CHECK-O-NEXT: Running analysis: TypeBasedAA on foo
+; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -95,7 +108,17 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O23SZ-NEXT: Running analysis: LastRunTrackingAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
+; CHECK-O1-NEXT: Running analysis: LastRunTrackingAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: BlockFrequencyAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O1-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -155,6 +178,7 @@
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on bar
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: AnnotationRemarksPass on foo
; CHECK-O-NEXT: Running pass: CanonicalizeAliasesPass
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
index 1f619898ea788..8a782e827701d 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
@@ -20,11 +20,11 @@ define void @_Z3foov() {
; CHECK-V1-IC1: [[VECTOR_BODY]]:
; CHECK-V1-IC1: br i1 [[TMP10:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP1:![0-9]+]]
; CHECK-V1-IC1: [[MIDDLE_BLOCK]]:
-; CHECK-V1-IC1: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF4:![0-9]+]]
+; CHECK-V1-IC1: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
; CHECK-V1-IC1: [[SCALAR_PH]]:
; CHECK-V1-IC1: br label %[[FOR_BODY:.*]]
; CHECK-V1-IC1: [[FOR_BODY]]:
-; CHECK-V1-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF5:![0-9]+]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-V1-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK-V1-IC1: [[FOR_COND_CLEANUP]]:
;
; CHECK-V2-IC1-LABEL: define void @_Z3foov(
@@ -36,11 +36,11 @@ define void @_Z3foov() {
; CHECK-V2-IC1: [[VECTOR_BODY]]:
; CHECK-V2-IC1: br i1 [[TMP4:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK-V2-IC1: [[MIDDLE_BLOCK]]:
-; CHECK-V2-IC1: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC1: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]], !prof [[PROF6:![0-9]+]]
; CHECK-V2-IC1: [[SCALAR_PH]]:
; CHECK-V2-IC1: br label %[[FOR_BODY:.*]]
; CHECK-V2-IC1: [[FOR_BODY]]:
-; CHECK-V2-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC1: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK-V2-IC1: [[FOR_COND_CLEANUP]]:
;
; CHECK-V2-IC4-LABEL: define void @_Z3foov(
@@ -54,19 +54,19 @@ define void @_Z3foov() {
; CHECK-V2-IC4: [[VECTOR_BODY]]:
; CHECK-V2-IC4: br i1 [[TMP12:%.*]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK-V2-IC4: [[MIDDLE_BLOCK]]:
-; CHECK-V2-IC4: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF5:![0-9]+]]
+; CHECK-V2-IC4: br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF6:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-V2-IC4: br i1 [[MIN_EPILOG_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF6:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[MIN_EPILOG_ITERS_CHECK:%.*]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF7:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_PH]]:
; CHECK-V2-IC4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; CHECK-V2-IC4: [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-V2-IC4: br i1 [[TMP23:%.*]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[TMP23:%.*]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
-; CHECK-V2-IC4: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF8:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[CMP_N:%.*]], label %[[FOR_COND_CLEANUP]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF10:![0-9]+]]
; CHECK-V2-IC4: [[VEC_EPILOG_SCALAR_PH]]:
; CHECK-V2-IC4: br label %[[FOR_BODY:.*]]
; CHECK-V2-IC4: [[FOR_BODY]]:
-; CHECK-V2-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; CHECK-V2-IC4: br i1 [[EXITCOND:%.*]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !prof [[PROF11:![0-9]+]], !llvm.loop [[LOOP12:![0-9]+]]
; CHECK-V2-IC4: [[FOR_COND_CLEANUP]]:
;
entry:
@@ -89,31 +89,37 @@ for.cond.cleanup: ; preds = %for.body
!0 = !{!"branch_weights", i32 1, i32 1023}
;.
; CHECK-V1-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
-; CHECK-V1-IC1: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK-V1-IC1: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
; CHECK-V1-IC1: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK-V1-IC1: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK-V1-IC1: [[PROF4]] = !{!"branch_weights", i32 1, i32 7}
-; CHECK-V1-IC1: [[PROF5]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-V1-IC1: [[LOOP6]] = distinct !{[[LOOP6]], [[META3]], [[META2]]}
+; CHECK-V1-IC1: [[META4]] = !{!"llvm.loop.estimated_trip_count", i32 128}
+; CHECK-V1-IC1: [[PROF5]] = !{!"branch_weights", i32 1, i32 7}
+; CHECK-V1-IC1: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V1-IC1: [[LOOP7]] = distinct !{[[LOOP7]], [[META8:![0-9]+]], [[META3]], [[META2]]}
+; CHECK-V1-IC1: [[META8]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
; CHECK-V2-IC1: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK-V2-IC1: [[PROF1]] = !{!"branch_weights", i32 1, i32 255}
-; CHECK-V2-IC1: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC1: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK-V2-IC1: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK-V2-IC1: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK-V2-IC1: [[PROF5]] = !{!"branch_weights", i32 1, i32 3}
-; CHECK-V2-IC1: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-V2-IC1: [[LOOP7]] = distinct !{[[LOOP7]], [[META4]], [[META3]]}
+; CHECK-V2-IC1: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 256}
+; CHECK-V2-IC1: [[PROF6]] = !{!"branch_weights", i32 1, i32 3}
+; CHECK-V2-IC1: [[PROF7]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC1: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META4]], [[META3]]}
+; CHECK-V2-IC1: [[META9]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
; CHECK-V2-IC4: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK-V2-IC4: [[PROF1]] = !{!"branch_weights", i32 1, i32 63}
-; CHECK-V2-IC4: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK-V2-IC4: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK-V2-IC4: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK-V2-IC4: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK-V2-IC4: [[PROF5]] = !{!"branch_weights", i32 1, i32 15}
-; CHECK-V2-IC4: [[PROF6]] = !{!"branch_weights", i32 2, i32 0}
-; CHECK-V2-IC4: [[LOOP7]] = distinct !{[[LOOP7]], [[META3]], [[META4]]}
-; CHECK-V2-IC4: [[PROF8]] = !{!"branch_weights", i32 1, i32 1}
-; CHECK-V2-IC4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK-V2-IC4: [[LOOP10]] = distinct !{[[LOOP10]], [[META4]], [[META3]]}
+; CHECK-V2-IC4: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 64}
+; CHECK-V2-IC4: [[PROF6]] = !{!"branch_weights", i32 1, i32 15}
+; CHECK-V2-IC4: [[PROF7]] = !{!"branch_weights", i32 2, i32 0}
+; CHECK-V2-IC4: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META3]], [[META4]]}
+; CHECK-V2-IC4: [[META9]] = !{!"llvm.loop.estimated_trip_count", i32 0}
+; CHECK-V2-IC4: [[PROF10]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK-V2-IC4: [[PROF11]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK-V2-IC4: [[LOOP12]] = distinct !{[[LOOP12]], [[META9]], [[META4]], [[META3]]}
;.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
index 08adfdd4793eb..8993032fc6d72 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
@@ -47,7 +47,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[AND:%.*]] = and i64 [[IV]], 1
; CHECK-NEXT: [[ICMP17:%.*]] = icmp eq i64 [[AND]], 0
-; CHECK-NEXT: br i1 [[ICMP17]], label [[BB18:%.*]], label [[LOOP_LATCH]], !prof [[PROF5:![0-9]+]]
+; CHECK-NEXT: br i1 [[ICMP17]], label [[BB18:%.*]], label [[LOOP_LATCH]], !prof [[PROF6:![0-9]+]]
; CHECK: bb18:
; CHECK-NEXT: [[OR:%.*]] = or disjoint i64 [[IV]], 1
; CHECK-NEXT: [[GETELEMENTPTR19:%.*]] = getelementptr inbounds i64, ptr [[ARR]], i64 [[OR]]
@@ -56,7 +56,7 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
; CHECK: loop.latch:
; CHECK-NEXT: [[IV_NEXT]] = add nsw i64 [[IV]], -1
; CHECK-NEXT: [[ICMP22:%.*]] = icmp eq i64 [[IV_NEXT]], 90
-; CHECK-NEXT: br i1 [[ICMP22]], label [[BB6]], label [[LOOP_HEADER]], !prof [[PROF6:![0-9]+]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-NEXT: br i1 [[ICMP22]], label [[BB6]], label [[LOOP_HEADER]], !prof [[PROF7:![0-9]+]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: bb6:
; CHECK-NEXT: ret void
;
@@ -97,10 +97,12 @@ attributes #0 = {"target-cpu"="haswell" "target-features"="+avx2" }
;.
; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 127}
; CHECK: [[PROF1]] = !{!"branch_weights", i32 1, i32 23}
-; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]]}
+; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META3:![0-9]+]], [[META4:![0-9]+]], [[META5:![0-9]+]]}
; CHECK: [[META3]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[PROF5]] = !{!"branch_weights", i32 1, i32 1}
-; CHECK: [[PROF6]] = !{!"branch_weights", i32 0, i32 0}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META4]], [[META3]]}
+; CHECK: [[META5]] = !{!"llvm.loop.estimated_trip_count", i32 24}
+; CHECK: [[PROF6]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK: [[PROF7]] = !{!"branch_weights", i32 0, i32 0}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META9:![0-9]+]], [[META4]], [[META3]]}
+; CHECK: [[META9]] = !{!"llvm.loop.estimated_trip_count", i32 0}
;.
diff --git a/llvm/test/Transforms/LoopVectorize/branch-weights.ll b/llvm/test/Transforms/LoopVectorize/branch-weights.ll
index 6892709f085f7..08bc920cef0e0 100644
--- a/llvm/test/Transforms/LoopVectorize/branch-weights.ll
+++ b/llvm/test/Transforms/LoopVectorize/branch-weights.ll
@@ -34,23 +34,23 @@ define void @f0(i8 %n, i32 %len, ptr %p) !prof !0 {
; MAINVF4IC1_EPI4: br i1 [[TMP8]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF3:![0-9]+]], !llvm.loop [[LOOP4:![0-9]+]]
; MAINVF4IC1_EPI4: [[MIDDLE_BLOCK]]:
; MAINVF4IC1_EPI4: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
-; MAINVF4IC1_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF7:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF8:![0-9]+]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_ITER_CHECK]]:
; MAINVF4IC1_EPI4: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i32 [[N_VEC_REMAINING:%.*]], 4
-; MAINVF4IC1_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF8:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF9:![0-9]+]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_PH]]:
; MAINVF4IC1_EPI4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_VECTOR_BODY]]:
; MAINVF4IC1_EPI4: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT6:%.*]], [[N_VEC3:%.*]]
-; MAINVF4IC1_EPI4: br i1 [[TMP12]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[TMP12]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF10:![0-9]+]], !llvm.loop [[LOOP11:![0-9]+]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
; MAINVF4IC1_EPI4: [[CMP_N8:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC3]]
-; MAINVF4IC1_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF7]]
+; MAINVF4IC1_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF8]]
; MAINVF4IC1_EPI4: [[VEC_EPILOG_SCALAR_PH]]:
; MAINVF4IC1_EPI4: br label %[[LOOP:.*]]
; MAINVF4IC1_EPI4: [[LOOP]]:
; MAINVF4IC1_EPI4: [[CMP_LOOP:%.*]] = icmp ult i32 [[I32:%.*]], [[LEN]]
-; MAINVF4IC1_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF11:![0-9]+]], !llvm.loop [[LOOP12:![0-9]+]]
+; MAINVF4IC1_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF13:![0-9]+]], !llvm.loop [[LOOP14:![0-9]+]]
; MAINVF4IC1_EPI4: [[EXIT_LOOPEXIT]]:
; MAINVF4IC1_EPI4: br label %[[EXIT]]
; MAINVF4IC1_EPI4: [[EXIT]]:
@@ -77,23 +77,23 @@ define void @f0(i8 %n, i32 %len, ptr %p) !prof !0 {
; MAINVF4IC2_EPI4: br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !prof [[PROF3:![0-9]+]], !llvm.loop [[LOOP4:![0-9]+]]
; MAINVF4IC2_EPI4: [[MIDDLE_BLOCK]]:
; MAINVF4IC2_EPI4: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
-; MAINVF4IC2_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF7:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]], !prof [[PROF8:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_ITER_CHECK]]:
; MAINVF4IC2_EPI4: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i32 [[N_VEC_REMAINING:%.*]], 4
-; MAINVF4IC2_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF8:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF9:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_PH]]:
; MAINVF4IC2_EPI4: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_VECTOR_BODY]]:
; MAINVF4IC2_EPI4: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT6:%.*]], [[N_VEC3:%.*]]
-; MAINVF4IC2_EPI4: br i1 [[TMP13]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF9:![0-9]+]], !llvm.loop [[LOOP10:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[TMP13]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !prof [[PROF10:![0-9]+]], !llvm.loop [[LOOP11:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_MIDDLE_BLOCK]]:
; MAINVF4IC2_EPI4: [[CMP_N8:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC3]]
-; MAINVF4IC2_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF11:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[CMP_N8]], label %[[EXIT_LOOPEXIT]], label %[[VEC_EPILOG_SCALAR_PH]], !prof [[PROF13:![0-9]+]]
; MAINVF4IC2_EPI4: [[VEC_EPILOG_SCALAR_PH]]:
; MAINVF4IC2_EPI4: br label %[[LOOP:.*]]
; MAINVF4IC2_EPI4: [[LOOP]]:
; MAINVF4IC2_EPI4: [[CMP_LOOP:%.*]] = icmp ult i32 [[I32:%.*]], [[LEN]]
-; MAINVF4IC2_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
+; MAINVF4IC2_EPI4: br i1 [[CMP_LOOP]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !prof [[PROF14:![0-9]+]], !llvm.loop [[LOOP15:![0-9]+]]
; MAINVF4IC2_EPI4: [[EXIT_LOOPEXIT]]:
; MAINVF4IC2_EPI4: br label %[[EXIT]]
; MAINVF4IC2_EPI4: [[EXIT]]:
@@ -127,28 +127,34 @@ exit:
; MAINVF4IC1_EPI4: [[PROF1]] = !{!"branch_weights", i32 12, i32 1}
; MAINVF4IC1_EPI4: [[PROF2]] = !{!"branch_weights", i32 1, i32 127}
; MAINVF4IC1_EPI4: [[PROF3]] = !{!"branch_weights", i32 1, i32 307}
-; MAINVF4IC1_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]]}
+; MAINVF4IC1_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
; MAINVF4IC1_EPI4: [[META5]] = !{!"llvm.loop.isvectorized", i32 1}
; MAINVF4IC1_EPI4: [[META6]] = !{!"llvm.loop.unroll.runtime.disable"}
-; MAINVF4IC1_EPI4: [[PROF7]] = !{!"branch_weights", i32 1, i32 3}
-; MAINVF4IC1_EPI4: [[PROF8]] = !{!"branch_weights", i32 4, i32 0}
-; MAINVF4IC1_EPI4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
-; MAINVF4IC1_EPI4: [[LOOP10]] = distinct !{[[LOOP10]], [[META5]], [[META6]]}
-; MAINVF4IC1_EPI4: [[PROF11]] = !{!"branch_weights", i32 2, i32 1}
-; MAINVF4IC1_EPI4: [[LOOP12]] = distinct !{[[LOOP12]], [[META5]]}
+; MAINVF4IC1_EPI4: [[META7]] = !{!"llvm.loop.estimated_trip_count", i32 308}
+; MAINVF4IC1_EPI4: [[PROF8]] = !{!"branch_weights", i32 1, i32 3}
+; MAINVF4IC1_EPI4: [[PROF9]] = !{!"branch_weights", i32 4, i32 0}
+; MAINVF4IC1_EPI4: [[PROF10]] = !{!"branch_weights", i32 0, i32 0}
+; MAINVF4IC1_EPI4: [[LOOP11]] = distinct !{[[LOOP11]], [[META5]], [[META6]], [[META12:![0-9]+]]}
+; MAINVF4IC1_EPI4: [[META12]] = !{!"llvm.loop.estimated_trip_count", i32 0}
+; MAINVF4IC1_EPI4: [[PROF13]] = !{!"branch_weights", i32 2, i32 1}
+; MAINVF4IC1_EPI4: [[LOOP14]] = distinct !{[[LOOP14]], [[META15:![0-9]+]], [[META5]]}
+; MAINVF4IC1_EPI4: [[META15]] = !{!"llvm.loop.estimated_trip_count", i32 3}
;.
; MAINVF4IC2_EPI4: [[PROF0]] = !{!"function_entry_count", i64 13}
; MAINVF4IC2_EPI4: [[PROF1]] = !{!"branch_weights", i32 12, i32 1}
; MAINVF4IC2_EPI4: [[PROF2]] = !{!"branch_weights", i32 1, i32 127}
; MAINVF4IC2_EPI4: [[PROF3]] = !{!"branch_weights", i32 1, i32 153}
-; MAINVF4IC2_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]]}
+; MAINVF4IC2_EPI4: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
; MAINVF4IC2_EPI4: [[META5]] = !{!"llvm.loop.isvectorized", i32 1}
; MAINVF4IC2_EPI4: [[META6]] = !{!"llvm.loop.unroll.runtime.disable"}
-; MAINVF4IC2_EPI4: [[PROF7]] = !{!"branch_weights", i32 1, i32 7}
-; MAINVF4IC2_EPI4: [[PROF8]] = !{!"branch_weights", i32 4, i32 0}
-; MAINVF4IC2_EPI4: [[PROF9]] = !{!"branch_weights", i32 0, i32 0}
-; MAINVF4IC2_EPI4: [[LOOP10]] = distinct !{[[LOOP10]], [[META5]], [[META6]]}
-; MAINVF4IC2_EPI4: [[PROF11]] = !{!"branch_weights", i32 1, i32 3}
-; MAINVF4IC2_EPI4: [[PROF12]] = !{!"branch_weights", i32 2, i32 1}
-; MAINVF4IC2_EPI4: [[LOOP13]] = distinct !{[[LOOP13]], [[META5]]}
+; MAINVF4IC2_EPI4: [[META7]] = !{!"llvm.loop.estimated_trip_count", i32 154}
+; MAINVF4IC2_EPI4: [[PROF8]] = !{!"branch_weights", i32 1, i32 7}
+; MAINVF4IC2_EPI4: [[PROF9]] = !{!"branch_weights", i32 4, i32 0}
+; MAINVF4IC2_EPI4: [[PROF10]] = !{!"branch_weights", i32 0, i32 0}
+; MAINVF4IC2_EPI4: [[LOOP11]] = distinct !{[[LOOP11]], [[META5]], [[META6]], [[META12:![0-9]+]]}
+; MAINVF4IC2_EPI4: [[META12]] = !{!"llvm.loop.estimated_trip_count", i32 0}
+; MAINVF4IC2_EPI4: [[PROF13]] = !{!"branch_weights", i32 1, i32 3}
+; MAINVF4IC2_EPI4: [[PROF14]] = !{!"branch_weights", i32 2, i32 1}
+; MAINVF4IC2_EPI4: [[LOOP15]] = distinct !{[[LOOP15]], [[META16:![0-9]+]], [[META5]]}
+; MAINVF4IC2_EPI4: [[META16]] = !{!"llvm.loop.estimated_trip_count", i32 3}
;.
diff --git a/llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll b/llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll
new file mode 100644
index 0000000000000..f3fe8fb373694
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll
@@ -0,0 +1,240 @@
+; Check the pgo-estimate-trip-counts pass. Indirectly check
+; llvm::getLoopEstimatedTripCount and llvm::setLoopEstimatedTripCount.
+
+; RUN: opt %s -S -passes=pgo-estimate-trip-counts 2>&1 | \
+; RUN: FileCheck %s -implicit-check-not='{{^[^ ;]*:}}'
+
+; No metadata and trip count is estimable: create metadata with value.
+;
+; CHECK-LABEL: define void @estimable(i32 %n) {
+define void @estimable(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %body
+
+; CHECK: body:
+body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %body ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop ![[#ESTIMABLE:]]
+ br i1 %cmp, label %body, label %exit, !prof !0
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; No metadata and trip count is inestimable because no branch weights: create
+; metadata with no value.
+;
+; CHECK-LABEL: define void @no_branch_weights(i32 %n) {
+define void @no_branch_weights(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %body
+
+; CHECK: body:
+body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %body ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %body, label %exit, !llvm.loop ![[#NO_BRANCH_WEIGHTS:]]
+ br i1 %cmp, label %body, label %exit
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; No metadata and trip count is inestimable because multiple latches: create
+; metadata with no value.
+;
+; CHECK-LABEL: define void @multi_latch(i32 %n, i1 %c) {
+define void @multi_latch(i32 %n, i1 %c) {
+; CHECK: entry:
+entry:
+ br label %head
+
+; CHECK: head:
+head:
+ %i = phi i32 [ 0, %entry ], [ %inc, %latch0], [ %inc, %latch1 ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %latch0, label %exit, !prof !0
+ br i1 %cmp, label %latch0, label %exit, !prof !0
+
+; CHECK: latch0:
+latch0:
+ ; CHECK: br i1 %c, label %head, label %latch1, !prof !0, !llvm.loop ![[#MULTI_LATCH:]]
+ br i1 %c, label %head, label %latch1, !prof !0
+
+; CHECK: latch1:
+latch1:
+ ; CHECK: br label %head
+ br label %head
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; Metadata is already present with value, and trip count is estimable: keep the
+; existing metadata value.
+;
+; CHECK-LABEL: define void @val_estimable(i32 %n) {
+define void @val_estimable(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %body
+
+; CHECK: body:
+body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %body ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop ![[#VAL_ESTIMABLE:]]
+ br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop !1
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; Metadata is already present with value, and trip count is inestimable: keep
+; the existing metadata value.
+;
+; CHECK-LABEL: define void @val_inestimable(i32 %n) {
+define void @val_inestimable(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %body
+
+; CHECK: body:
+body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %body ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %body, label %exit, !llvm.loop ![[#VAL_INESTIMABLE:]]
+ br i1 %cmp, label %body, label %exit, !llvm.loop !3
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; Metadata is already present without value, and trip count is estimable: add
+; new value to metadata.
+;
+; CHECK-LABEL: define void @no_val_estimable(i32 %n) {
+define void @no_val_estimable(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %body
+
+; CHECK: body:
+body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %body ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop ![[#NO_VAL_ESTIMABLE:]]
+ br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop !5
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; Metadata is already present without value, and trip count is inestimable:
+; leave no value on metadata.
+;
+; CHECK-LABEL: define void @no_val_inestimable(i32 %n) {
+define void @no_val_inestimable(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %body
+
+; CHECK: body:
+body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %body ]
+ %inc = add nsw i32 %i, 1
+ %cmp = icmp slt i32 %inc, %n
+ ; CHECK: br i1 %cmp, label %body, label %exit, !llvm.loop ![[#NO_VAL_INESTIMABLE:]]
+ br i1 %cmp, label %body, label %exit, !llvm.loop !7
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; Check that nested loops are visited.
+;
+; CHECK-LABEL: define void @nested(i32 %n) {
+define void @nested(i32 %n) {
+; CHECK: entry:
+entry:
+ br label %loop0.head
+
+; CHECK: loop0.head:
+loop0.head:
+ %loop0.i = phi i32 [ 0, %entry ], [ %loop0.inc, %loop0.latch ]
+ br label %loop1.head
+
+; CHECK: loop1.head:
+loop1.head:
+ %loop1.i = phi i32 [ 0, %loop0.head ], [ %loop1.inc, %loop1.latch ]
+ br label %loop2
+
+; CHECK: loop2:
+loop2:
+ %loop2.i = phi i32 [ 0, %loop1.head ], [ %loop2.inc, %loop2 ]
+ %loop2.inc = add nsw i32 %loop2.i, 1
+ %loop2.cmp = icmp slt i32 %loop2.inc, %n
+ ; CHECK: br i1 %loop2.cmp, label %loop2, label %loop1.latch, !prof !0, !llvm.loop ![[#NESTED_LOOP2:]]
+ br i1 %loop2.cmp, label %loop2, label %loop1.latch, !prof !0
+
+; CHECK: loop1.latch:
+loop1.latch:
+ %loop1.inc = add nsw i32 %loop1.i, 1
+ %loop1.cmp = icmp slt i32 %loop1.inc, %n
+ ; CHECK: br i1 %loop1.cmp, label %loop1.head, label %loop0.latch, !prof !0, !llvm.loop ![[#NESTED_LOOP1:]]
+ br i1 %loop1.cmp, label %loop1.head, label %loop0.latch, !prof !0
+
+; CHECK: loop0.latch:
+loop0.latch:
+ %loop0.inc = add nsw i32 %loop0.i, 1
+ %loop0.cmp = icmp slt i32 %loop0.inc, %n
+ ; CHECK: br i1 %loop0.cmp, label %loop0.head, label %exit, !prof !0, !llvm.loop ![[#NESTED_LOOP0:]]
+ br i1 %loop0.cmp, label %loop0.head, label %exit, !prof !0
+
+; CHECK: exit:
+exit:
+ ret void
+}
+
+; CHECK: !0 = !{!"branch_weights", i32 9, i32 1}
+;
+; CHECK: ![[#ESTIMABLE]] = distinct !{![[#ESTIMABLE]], ![[#ESTIMABLE_TC:]]}
+; CHECK: ![[#ESTIMABLE_TC]] = !{!"llvm.loop.estimated_trip_count", i32 10}
+;
+; CHECK: ![[#NO_BRANCH_WEIGHTS]] = distinct !{![[#NO_BRANCH_WEIGHTS]], ![[#INESTIMABLE_TC:]]}
+; CHECK: ![[#INESTIMABLE_TC]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: ![[#MULTI_LATCH]] = distinct !{![[#MULTI_LATCH]], ![[#INESTIMABLE_TC:]]}
+;
+; CHECK: ![[#VAL_ESTIMABLE]] = distinct !{![[#VAL_ESTIMABLE]], ![[#VAL_TC:]]}
+; CHECK: ![[#VAL_TC]] = !{!"llvm.loop.estimated_trip_count", i32 5}
+; CHECK: ![[#VAL_INESTIMABLE]] = distinct !{![[#VAL_INESTIMABLE]], ![[#VAL_TC:]]}
+;
+; CHECK: ![[#NO_VAL_ESTIMABLE]] = distinct !{![[#NO_VAL_ESTIMABLE]], ![[#ESTIMABLE_TC:]]}
+; CHECK: ![[#NO_VAL_INESTIMABLE]] = distinct !{![[#NO_VAL_INESTIMABLE]], ![[#INESTIMABLE_TC:]]}
+;
+; CHECK: ![[#NESTED_LOOP2]] = distinct !{![[#NESTED_LOOP2]], ![[#ESTIMABLE_TC:]]}
+; CHECK: ![[#NESTED_LOOP1]] = distinct !{![[#NESTED_LOOP1]], ![[#ESTIMABLE_TC:]]}
+; CHECK: ![[#NESTED_LOOP0]] = distinct !{![[#NESTED_LOOP0]], ![[#ESTIMABLE_TC:]]}
+!0 = !{!"branch_weights", i32 9, i32 1}
+!1 = distinct !{!1, !2}
+!2 = !{!"llvm.loop.estimated_trip_count", i32 5}
+!3 = distinct !{!3, !2}
+!5 = distinct !{!5, !6}
+!6 = !{!"llvm.loop.estimated_trip_count"}
+!7 = distinct !{!7, !6}
>From 47fbe85b5c85fa69d3f0553a9bbce9b3114d3c58 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 21 Jul 2025 19:52:52 -0400
Subject: [PATCH 10/27] Add PGOEstimateTripCounts in more cases
---
llvm/lib/Passes/PassBuilderPipelines.cpp | 6 ++---
llvm/test/Other/new-pm-defaults.ll | 19 +++++++++++++-
.../Other/new-pm-thinlto-postlink-defaults.ll | 21 ++++++++++++++--
.../new-pm-thinlto-postlink-pgo-defaults.ll | 19 +++++++++++++-
.../Other/new-pm-thinlto-prelink-defaults.ll | 19 +++++++++++++-
.../Transforms/LoopUnroll/unroll-cleanup.ll | 12 +++++----
.../LoopVectorize/X86/already-vectorized.ll | 5 ++--
.../LoopVectorize/vect.omp.persistence.ll | 3 ++-
.../AArch64/block_scaling_decompr_8bit.ll | 3 ++-
.../AArch64/extra-unroll-simplifications.ll | 9 ++++---
.../AArch64/hoist-runtime-checks.ll | 25 ++++++++++---------
.../AArch64/indvars-vectorization.ll | 11 ++++----
.../AArch64/predicated-reduction.ll | 24 ++++++++++--------
.../Transforms/PhaseOrdering/X86/pr88239.ll | 7 +++---
.../X86/preserve-access-group.ll | 11 ++++----
.../PhaseOrdering/branch-dom-cond.ll | 6 ++++-
16 files changed, 141 insertions(+), 59 deletions(-)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index fc0d88e710426..d3f741d09d724 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1240,6 +1240,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(AssignGUIDPass());
if (IsCtxProfUse) {
MPM.addPass(PGOCtxProfFlatteningPass(/*IsPreThinlink=*/true));
+ MPM.addPass(PGOEstimateTripCountsPass());
return MPM;
}
// Block further inlining in the instrumented ctxprof case. This avoids
@@ -1271,11 +1272,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
if (PGOOpt && (PGOOpt->Action == PGOOptions::IRUse ||
PGOOpt->Action == PGOOptions::SampleUse)) {
MPM.addPass(PGOForceFunctionAttrsPass(PGOOpt->ColdOptType));
- // TODO: Is this the right place for this pass? Should we enable it in any
- // other case, such as when __builtin_expect_with_probability or
- // __builtin_expect appears in the source code but profiles are not read?
- MPM.addPass(PGOEstimateTripCountsPass());
}
+ MPM.addPass(PGOEstimateTripCountsPass());
MPM.addPass(AlwaysInlinerPass(/*InsertLifetimeIntrinsics=*/true));
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index c554fdbf4c799..afa31015288ae 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -6,6 +6,9 @@
; to be invalidated.
; Any invalidation that shows up here is a bug, unless we started modifying
; the IR, in which case we need to make it immutable harder.
+;
+; FIXME: Invalidation has been showing up for years. Something needs to be
+; fixed, perhaps just the above comment.
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='default<O1>' -S %s 2>&1 \
@@ -131,7 +134,11 @@
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Running analysis: LoopAnalysis
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -139,23 +146,31 @@
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-EP-CGSCC-LATE-NEXT: Running pass: NoOpCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
+; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
+; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
+; CHECK-O-NEXT: Running analysis: BasicAA
+; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA
+; CHECK-O-NEXT: Running analysis: TypeBasedAA
+; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -164,6 +179,7 @@
; CHECK-JUMP-TABLE-TO-SWITCH-NEXT: Running pass: JumpTableToSwitchPass
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -238,6 +254,7 @@
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-DEFAULT-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-LTO-NOT: Running pass: EliminateAvailableExternallyPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 62bb02d9b3c40..355b80c54a174 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -7,6 +7,9 @@
; Any invalidation that shows up here is a bug, unless we started modifying
; the IR, in which case we need to make it immutable harder.
;
+; FIXME: Invalidation has been showing up for years. Something needs to be
+; fixed, perhaps just the above comment.
+;
; Postlink pipelines:
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto<O1>' -S %s 2>&1 \
@@ -62,30 +65,42 @@
; CHECK-O-NEXT: Running analysis: TypeBasedAA
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Running analysis: LoopAnalysis
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Invalidating analysis: AAManager
-; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
+; CHECK-O-NEXT: Running pass: RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module>
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
+; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
+; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
+; CHECK-O-NEXT: Running analysis: BasicAA on foo
+; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA on foo
+; CHECK-O-NEXT: Running analysis: TypeBasedAA on foo
+; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -93,6 +108,7 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -162,6 +178,7 @@
; CHECK-O-NEXT: Running pass: SimplifyTypeTestsPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-POSTLINK-O-NEXT: Running pass: GlobalOptPass
+; CHECK-POSTLINK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-POSTLINK-O-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-POSTLINK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 0da7a9f73bdce..8328e4e69b0b9 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -51,29 +51,40 @@
; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
+; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
+; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
+; CHECK-O-NEXT: Running analysis: BasicAA
+; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA
+; CHECK-O-NEXT: Running analysis: TypeBasedAA
+; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -81,6 +92,11 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
+; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
+; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
+; CHECK-O-NEXT: Running analysis: LoopAnalysis
+; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -146,6 +162,7 @@
; CHECK-O-NEXT: Running pass: SimplifyTypeTestsPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index 5aacd26def2be..a2248bc4d7f4f 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -7,6 +7,9 @@
; Any invalidation that shows up here is a bug, unless we started modifying
; the IR, in which case we need to make it immutable harder.
;
+; FIXME: Invalidation has been showing up for years. Something needs to be
+; fixed, perhaps just the above comment.
+;
; Prelink pipelines:
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O1>' -S %s 2>&1 \
@@ -94,7 +97,11 @@
; CHECK-O-NEXT: Running analysis: TypeBasedAA
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
+; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
+; CHECK-O-NEXT: Running analysis: LoopAnalysis
+; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -102,22 +109,30 @@
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
+; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
+; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
+; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
+; CHECK-O-NEXT: Running analysis: BasicAA
+; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA
+; CHECK-O-NEXT: Running analysis: TypeBasedAA
+; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -125,6 +140,7 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -190,6 +206,7 @@
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
+; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-EXT: Running pass: {{.*}}::Bye
; CHECK-EP-OPT-EARLY-NEXT: Running pass: NoOpModulePass
diff --git a/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll b/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll
index 3353a59ea1d12..67ef89aa51369 100644
--- a/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll
+++ b/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll
@@ -31,15 +31,15 @@ define void @_Z3fn1v(ptr %r, ptr %a) #0 {
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A_021:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[T2:%.*]], -1
; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
-; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SCEVGEP:%.*]], i64 [[TMP1]]
+; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[TMP1]]
; CHECK-NEXT: [[T1_PRE:%.*]] = load i32, ptr @b, align 4
; CHECK-NEXT: br label %[[FOR_COND_LOOPEXIT:.*]]
; CHECK: [[FOR_COND_LOOPEXIT]]:
; CHECK-NEXT: [[T1:%.*]] = phi i32 [ [[T12:%.*]], %[[FOR_BODY]] ], [ [[T1_PRE]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
; CHECK-NEXT: [[R_1_LCSSA:%.*]] = phi ptr [ [[R_022:%.*]], %[[FOR_BODY]] ], [ [[ADD_PTR_LCSSA]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
-; CHECK-NEXT: [[A_1_LCSSA:%.*]] = phi ptr [ [[A_021:%.*]], %[[FOR_BODY]] ], [ [[SCEVGEP1]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
+; CHECK-NEXT: [[A_1_LCSSA:%.*]] = phi ptr [ [[A_021]], %[[FOR_BODY]] ], [ [[SCEVGEP1]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i32 [[T1]], 0
-; CHECK-NEXT: br i1 [[TOBOOL]], label %[[FOR_END6]], label %[[FOR_BODY]]
+; CHECK-NEXT: br i1 [[TOBOOL]], label %[[FOR_END6]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[FOR_BODY]]:
; CHECK-NEXT: [[T12]] = phi i32 [ [[T1]], %[[FOR_COND_LOOPEXIT]] ], [ [[T]], %[[ENTRY]] ]
; CHECK-NEXT: [[R_022]] = phi ptr [ [[R_1_LCSSA]], %[[FOR_COND_LOOPEXIT]] ], [ [[R]], %[[ENTRY]] ]
@@ -106,7 +106,7 @@ define void @_Z3fn1v(ptr %r, ptr %a) #0 {
; CHECK-NEXT: [[INCDEC_PTR_1]] = getelementptr inbounds nuw i8, ptr [[A_116]], i64 2
; CHECK-NEXT: [[ADD_PTR_1]] = getelementptr inbounds nuw i8, ptr [[R_117]], i64 12
; CHECK-NEXT: [[TOBOOL2_1:%.*]] = icmp eq i32 [[DEC18_1]], 0
-; CHECK-NEXT: br i1 [[TOBOOL2_1]], label %[[FOR_COND_LOOPEXIT_LOOPEXIT]], label %[[FOR_BODY3]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT: br i1 [[TOBOOL2_1]], label %[[FOR_COND_LOOPEXIT_LOOPEXIT]], label %[[FOR_BODY3]], !llvm.loop [[LOOP2:![0-9]+]]
; CHECK: [[FOR_END6]]:
; CHECK-NEXT: ret void
;
@@ -174,5 +174,7 @@ for.end6: ; preds = %for.cond.for.end6_c
!1 = !{!"llvm.loop.unroll.count", i32 2}
;.
; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META1]], [[META3:![0-9]+]]}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
;.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll b/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll
index 41fdcf2eb56d0..fba87072f8b24 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll
@@ -41,7 +41,8 @@ for.end: ; preds = %for.body
}
; Now, we check for the Hint metadata
-; CHECK: [[vect]] = distinct !{[[vect]], [[width:![0-9]+]], [[runtime_unroll:![0-9]+]]}
+; CHECK: [[vect]] = distinct !{[[vect]], ![[#etc:]], [[width:![0-9]+]], [[runtime_unroll:![0-9]+]]}
+; CHECK: [[#etc]] = !{!"llvm.loop.estimated_trip_count"}
; CHECK: [[width]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[runtime_unroll]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[scalar]] = distinct !{[[scalar]], [[runtime_unroll]], [[width]]}
+; CHECK: [[scalar]] = distinct !{[[scalar]], ![[#etc]], [[runtime_unroll]], [[width]]}
diff --git a/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll b/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll
index 2c268664ef991..9b031e31944ad 100644
--- a/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll
+++ b/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll
@@ -13,8 +13,9 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
;
; CHECK-LABEL: @foo
; CHECK: !llvm.loop !0
-; CHECK: !0 = distinct !{!0, !1}
+; CHECK: !0 = distinct !{!0, !1, !2}
; CHECK: !1 = !{!"llvm.loop.vectorize.enable", i1 true}
+; CHECK: !2 = !{!"llvm.loop.estimated_trip_count"}
define i32 @foo(i32 %a) {
entry:
br label %loop_cond
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll
index 7175816963ed1..5694dd40ef59b 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll
@@ -801,6 +801,7 @@ attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
!4 = distinct !{!4, !5}
!5 = !{!"llvm.loop.mustprogress"}
;.
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]]}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]]}
; CHECK: [[META5]] = !{!"llvm.loop.mustprogress"}
+; CHECK: [[META6]] = !{!"llvm.loop.estimated_trip_count"}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll
index d8fc42bfaaec2..496e33d09d2a1 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll
@@ -107,7 +107,7 @@ define void @cse_matching_load_from_previous_unrolled_iteration(i32 %N, ptr %src
; CHECK-NEXT: [[INDVARS_IV_NEXT_1]] = add nuw nsw i64 [[INDVARS_IV]], 2
; CHECK-NEXT: [[NITER_NEXT_1]] = add i64 [[NITER]], 2
; CHECK-NEXT: [[NITER_NCMP_1:%.*]] = icmp eq i64 [[NITER_NEXT_1]], [[UNROLL_ITER]]
-; CHECK-NEXT: br i1 [[NITER_NCMP_1]], label [[EXIT_LOOPEXIT_UNR_LCSSA]], label [[LOOP_LATCH]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[NITER_NCMP_1]], label [[EXIT_LOOPEXIT_UNR_LCSSA]], label [[LOOP_LATCH]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: exit.loopexit.unr-lcssa:
; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ 0, [[LOOP_LATCH_PREHEADER]] ], [ [[INDVARS_IV_NEXT_1]], [[LOOP_LATCH]] ]
; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
@@ -154,8 +154,9 @@ exit:
!1 = !{!"llvm.loop.mustprogress"}
!2 = !{!"llvm.loop.unroll.count", i32 2}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
; CHECK: [[META1]] = !{!"llvm.loop.mustprogress"}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.disable"}
-; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
+; CHECK: [[META2]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]], [[META3]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
index b4b12da3244b2..f9f5be3647c07 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
@@ -54,7 +54,7 @@ define i32 @read_only_loop_with_runtime_check(ptr noundef %array, i32 noundef %c
; CHECK-NEXT: [[ADD]] = add nsw i32 [[TMP8]], [[SUM_07]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[TMP0]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: if.then:
; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable
@@ -129,7 +129,7 @@ define dso_local noundef i32 @sum_prefix_with_sum(ptr %s.coerce0, i64 %s.coerce1
; CHECK: for.body.preheader:
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
; CHECK-NEXT: [[DOTNOT_NOT:%.*]] = icmp ugt i64 [[S_COERCE1]], [[TMP0]]
-; CHECK-NEXT: br i1 [[DOTNOT_NOT]], label [[FOR_BODY_PREHEADER8:%.*]], label [[COND_FALSE_I:%.*]], !prof [[PROF4:![0-9]+]]
+; CHECK-NEXT: br i1 [[DOTNOT_NOT]], label [[FOR_BODY_PREHEADER8:%.*]], label [[COND_FALSE_I:%.*]], !prof [[PROF5:![0-9]+]]
; CHECK: for.body.preheader8:
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER11:%.*]], label [[VECTOR_PH:%.*]]
@@ -148,7 +148,7 @@ define dso_local noundef i32 @sum_prefix_with_sum(ptr %s.coerce0, i64 %s.coerce1
; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[WIDE_LOAD10]], [[VEC_PHI9]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP5]], label [[SPAN_CHECKED_ACCESS_EXIT:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP5]], label [[SPAN_CHECKED_ACCESS_EXIT:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[ADD:%.*]] = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])
@@ -169,7 +169,7 @@ define dso_local noundef i32 @sum_prefix_with_sum(ptr %s.coerce0, i64 %s.coerce1
; CHECK-NEXT: [[ADD1]] = add nsw i32 [[TMP7]], [[RET_06]]
; CHECK-NEXT: [[INC]] = add nuw i64 [[I_07]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY1]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY1]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: cond.false.i:
; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable
@@ -228,7 +228,7 @@ define hidden noundef nonnull align 4 dereferenceable(4) ptr @span_checked_acces
; CHECK-NEXT: [[__SIZE__I:%.*]] = getelementptr inbounds nuw i8, ptr [[THIS]], i64 8
; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[__SIZE__I]], align 8
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[__IDX]], [[TMP0]]
-; CHECK-NEXT: br i1 [[CMP]], label [[COND_END:%.*]], label [[COND_FALSE:%.*]], !prof [[PROF4]]
+; CHECK-NEXT: br i1 [[CMP]], label [[COND_END:%.*]], label [[COND_FALSE:%.*]], !prof [[PROF5]]
; CHECK: cond.false:
; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable
@@ -289,11 +289,12 @@ declare void @llvm.trap()
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
-; CHECK: [[PROF4]] = !{!"branch_weights", !"expected", i32 2000, i32 1}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]]}
-; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META2]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META3]], [[META2]]}
+; CHECK: [[PROF5]] = !{!"branch_weights", !"expected", i32 2000, i32 1}
+; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META3]], [[META2]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll
index b056f44a6c469..552035c0fca6a 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll
@@ -88,7 +88,7 @@ define void @s172(i32 noundef %xa, i32 noundef %xb, ptr noundef %a, ptr noundef
; CHECK-NEXT: store i32 [[ADD]], ptr [[GEP_A]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], [[TMP1]]
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 32000
-; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP9:![0-9]+]]
+; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: for.end:
; CHECK-NEXT: ret void
;
@@ -131,9 +131,10 @@ for.end:
; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]], [[META8:![0-9]+]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]], [[META8:![0-9]+]], [[META9:![0-9]+]]}
; CHECK: [[META6]] = !{!"llvm.loop.mustprogress"}
-; CHECK: [[META7]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META8]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META6]], [[META7]]}
+; CHECK: [[META7]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META8]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META9]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP10]] = distinct !{[[LOOP10]], [[META6]], [[META7]], [[META8]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll
index c7098d2ce96ce..39c438279a77e 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll
@@ -81,7 +81,7 @@ define nofpclass(nan inf) double @monte_simple(i32 noundef %nblocks, i32 noundef
; CHECK-NEXT: [[V1_2]] = fadd reassoc arcp contract afn double [[V1_012]], [[ADD4]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[FOR_END_LOOPEXIT]]:
; CHECK-NEXT: [[V0_1:%.*]] = phi double [ [[TMP22]], %[[MIDDLE_BLOCK]] ], [ [[V0_2]], %[[FOR_BODY]] ]
; CHECK-NEXT: [[V1_1:%.*]] = phi double [ [[TMP21]], %[[MIDDLE_BLOCK]] ], [ [[V1_2]], %[[FOR_BODY]] ]
@@ -239,7 +239,7 @@ define nofpclass(nan inf) double @monte_exp(i32 noundef %nblocks, i32 noundef %R
; CHECK-NEXT: [[TMP23]] = fadd reassoc arcp contract afn <4 x double> [[VEC_PHI31]], [[TMP21]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDVARS_IV1]], 8
; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP24]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP24]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc arcp contract afn <4 x double> [[TMP23]], [[TMP22]]
; CHECK-NEXT: [[TMP25:%.*]] = tail call reassoc arcp contract afn double @llvm.vector.reduce.fadd.v4f64(double -0.000000e+00, <4 x double> [[BIN_RDX]])
@@ -269,19 +269,19 @@ define nofpclass(nan inf) double @monte_exp(i32 noundef %nblocks, i32 noundef %R
; CHECK-NEXT: [[V1_2_US]] = fadd reassoc arcp contract afn double [[V1_116_US]], [[ADD7_US1]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND25_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT: br i1 [[EXITCOND25_NOT]], label %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]], label %[[FOR_BODY3_US]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND25_NOT]], label %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]], label %[[FOR_BODY3_US]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: [[FOR_COND1_FOR_INC8_CRIT_EDGE_US]]:
; CHECK-NEXT: [[V0_2_US_LCSSA]] = phi double [ [[TMP26]], %[[MIDDLE_BLOCK]] ], [ [[V0_2_US]], %[[FOR_BODY3_US]] ]
; CHECK-NEXT: [[V1_2_US_LCSSA]] = phi double [ [[TMP25]], %[[MIDDLE_BLOCK]] ], [ [[V1_2_US]], %[[FOR_BODY3_US]] ]
; CHECK-NEXT: [[INC9_US]] = add nuw nsw i32 [[BLOCK_017_US]], 1
; CHECK-NEXT: [[EXITCOND26_NOT:%.*]] = icmp eq i32 [[INC9_US]], [[NBLOCKS]]
-; CHECK-NEXT: br i1 [[EXITCOND26_NOT]], label %[[FOR_END10]], label %[[FOR_BODY_US]]
+; CHECK-NEXT: br i1 [[EXITCOND26_NOT]], label %[[FOR_END10]], label %[[FOR_BODY_US]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[FOR_BODY]]:
; CHECK-NEXT: [[BLOCK_017:%.*]] = phi i32 [ [[INC9:%.*]], %[[FOR_BODY]] ], [ 0, %[[FOR_BODY_LR_PH]] ]
; CHECK-NEXT: tail call void @resample(i32 noundef [[RAND_BLOCK_LENGTH]], ptr noundef [[SAMPLES]])
; CHECK-NEXT: [[INC9]] = add nuw nsw i32 [[BLOCK_017]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC9]], [[NBLOCKS]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END10]], label %[[FOR_BODY]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END10]], label %[[FOR_BODY]], !llvm.loop [[LOOP7]]
; CHECK: [[FOR_END10]]:
; CHECK-NEXT: [[V0_0_LCSSA:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ], [ [[V0_2_US_LCSSA]], %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]] ], [ 0.000000e+00, %[[FOR_BODY]] ]
; CHECK-NEXT: [[V1_0_LCSSA:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ], [ [[V1_2_US_LCSSA]], %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]] ], [ 0.000000e+00, %[[FOR_BODY]] ]
@@ -403,10 +403,12 @@ declare void @resample(i32 noundef, ptr noundef)
declare double @llvm.exp2.f64(double)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture)
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META3]], [[META2]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META3]], [[META2]]}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll b/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll
index c98e7d349e6c0..4a794199a5415 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll
@@ -48,7 +48,8 @@ define void @foo(ptr noalias noundef %0, ptr noalias noundef %1) optsize {
br label %3
}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll b/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll
index be7f4c2a941a0..cacc73f7bc6b2 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll
@@ -63,7 +63,7 @@ define void @test(i32 noundef %nface, i32 noundef %ncell, ptr noalias noundef %f
; CHECK-NEXT: store double [[TMP26]], ptr [[ARRAYIDX4_3]], align 8, !tbaa [[TBAA5]], !llvm.access.group [[ACC_GRP4]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[TMP0]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
;
entry:
%nface.addr = alloca i32, align 4
@@ -197,10 +197,11 @@ attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: re
; CHECK: [[ACC_GRP4]] = distinct !{}
; CHECK: [[TBAA5]] = !{[[META6:![0-9]+]], [[META6]], i64 0}
; CHECK: [[META6]] = !{!"double", [[META2]], i64 0}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META8:![0-9]+]], [[META9:![0-9]+]], [[META10:![0-9]+]], [[META11:![0-9]+]]}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META8:![0-9]+]], [[META9:![0-9]+]], [[META10:![0-9]+]], [[META11:![0-9]+]], [[META12:![0-9]+]]}
; CHECK: [[META8]] = !{!"llvm.loop.mustprogress"}
; CHECK: [[META9]] = !{!"llvm.loop.parallel_accesses", [[ACC_GRP4]]}
-; CHECK: [[META10]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META11]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP12]] = distinct !{[[LOOP12]], [[META8]], [[META9]], [[META11]], [[META10]]}
+; CHECK: [[META10]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META11]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META12]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP13]] = distinct !{[[LOOP13]], [[META8]], [[META9]], [[META10]], [[META12]], [[META11]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll b/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll
index 904ea6c2d9e73..c013984202a29 100644
--- a/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll
+++ b/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll
@@ -13,7 +13,7 @@ define void @growTables(ptr %p) {
; CHECK-NEXT: [[CALL9:%.*]] = load volatile ptr, ptr [[P]], align 8
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_02]], 1
; CHECK-NEXT: [[CMP7:%.*]] = icmp slt i32 [[INC]], [[CALL]]
-; CHECK-NEXT: br i1 [[CMP7]], label %[[FOR_BODY]], label %[[FOR_BODY12:.*]]
+; CHECK-NEXT: br i1 [[CMP7]], label %[[FOR_BODY]], label %[[FOR_BODY12:.*]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[FOR_BODY12]]:
; CHECK-NEXT: [[CALL14:%.*]] = load volatile ptr, ptr [[P]], align 8
; CHECK-NEXT: br label %[[COMMON_RET]]
@@ -45,3 +45,7 @@ for.body12:
common.ret:
ret void
}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
+;.
>From f8097fbf978f47e8ff7afaa0279573355c6c787a Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 21 Jul 2025 19:58:54 -0400
Subject: [PATCH 11/27] Add unused initialization
---
llvm/lib/Transforms/Utils/LoopUtils.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index a3e473022a512..4894d6129072a 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -894,7 +894,7 @@ std::optional<unsigned> llvm::getLoopEstimatedTripCount(
// Return the estimated trip count from metadata unless the metadata is
// missing or has no value.
- bool Missing;
+ bool Missing = false; // Initialization is expected to be unused.
if (auto TC = getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount,
&Missing)) {
LLVM_DEBUG(dbgs() << "getLoopEstimatedTripCount: "
>From 7b27203f488a44288ad213c4ac31940e30d5b17a Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 21 Jul 2025 20:15:34 -0400
Subject: [PATCH 12/27] Simplify some test changes
---
...-pm-thinlto-postlink-samplepgo-defaults.ll | 27 +++++++------------
...w-pm-thinlto-prelink-samplepgo-defaults.ll | 15 ++++-------
2 files changed, 14 insertions(+), 28 deletions(-)
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index 516f1bf972429..0a1e152932da8 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -91,32 +91,23 @@
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O23SZ-NEXT: Running analysis: BasicAA on foo
-; CHECK-O23SZ-NEXT: Running analysis: ScopedNoAliasAA on foo
-; CHECK-O23SZ-NEXT: Running analysis: TypeBasedAA on foo
-; CHECK-O23SZ-NEXT: Running analysis: OuterAnalysisManagerProxy
+; CHECK-O-NEXT: Running analysis: BasicAA on foo
+; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA on foo
+; CHECK-O-NEXT: Running analysis: TypeBasedAA on foo
+; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
-; CHECK-O1-NEXT: Running analysis: BasicAA on foo
-; CHECK-O1-NEXT: Running analysis: ScopedNoAliasAA on foo
-; CHECK-O1-NEXT: Running analysis: TypeBasedAA on foo
-; CHECK-O1-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O23SZ-NEXT: Running analysis: LastRunTrackingAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
+; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis on foo
+; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
+; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
-; CHECK-O1-NEXT: Running analysis: LastRunTrackingAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: BlockFrequencyAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 0f8e1c770a4dc..e2db335d39397 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -108,17 +108,12 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O23SZ-NEXT: Running analysis: LastRunTrackingAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
+; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis on foo
+; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
+; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
+; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
-; CHECK-O1-NEXT: Running analysis: LastRunTrackingAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: BlockFrequencyAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O1-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass
>From 4c4669a8bce3165f4ba460e80d31dd136c16dceb Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Wed, 23 Jul 2025 21:09:57 -0400
Subject: [PATCH 13/27] Extend verify pass to cover new metadata
---
llvm/docs/LangRef.rst | 4 +-
llvm/include/llvm/IR/Metadata.h | 4 +-
.../include/llvm/Transforms/Utils/LoopUtils.h | 2 +
llvm/lib/IR/Verifier.cpp | 16 +++++
llvm/lib/Transforms/Utils/LoopUtils.cpp | 2 -
.../llvm.loop.estimated_trip_count.ll | 58 +++++++++++++++++++
6 files changed, 80 insertions(+), 6 deletions(-)
create mode 100644 llvm/test/Verifier/llvm.loop.estimated_trip_count.ll
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 5bcb65fdf44a9..ceff9a7e2d9f2 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7950,8 +7950,8 @@ loop distribution pass. See
This metadata records an estimated trip count for the loop. The first operand
is the string ``llvm.loop.estimated_trip_count``. The second operand is an
-integer specifying the count, which might be omitted for the reasons described
-below. For example:
+integer constant of type ``i32`` or smaller specifying the count, which might be
+omitted for the reasons described below. For example:
.. code-block:: llvm
diff --git a/llvm/include/llvm/IR/Metadata.h b/llvm/include/llvm/IR/Metadata.h
index af252aa24567a..93b3ab967844a 100644
--- a/llvm/include/llvm/IR/Metadata.h
+++ b/llvm/include/llvm/IR/Metadata.h
@@ -911,8 +911,8 @@ class MDOperand {
// Check if MDOperand is of type MDString and equals `Str`.
bool equalsStr(StringRef Str) const {
- return isa<MDString>(this->get()) &&
- cast<MDString>(this->get())->getString() == Str;
+ return isa_and_nonnull<MDString>(get()) &&
+ cast<MDString>(get())->getString() == Str;
}
~MDOperand() { untrack(); }
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 7d03fb0d81e4c..73394bf1380f0 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -52,6 +52,8 @@ typedef std::pair<const RuntimeCheckingPtrGroup *,
template <typename T, unsigned N> class SmallSetVector;
template <typename T, unsigned N> class SmallPriorityWorklist;
+const char *const LLVMLoopEstimatedTripCount = "llvm.loop.estimated_trip_count";
+
LLVM_ABI BasicBlock *InsertPreheaderForLoop(Loop *L, DominatorTree *DT,
LoopInfo *LI,
MemorySSAUpdater *MSSAU,
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index ef37f70bf8aea..84eb3960a274d 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -121,6 +121,7 @@
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/ModRef.h"
#include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Utils/LoopUtils.h"
#include <algorithm>
#include <cassert>
#include <cstdint>
@@ -1071,6 +1072,21 @@ void Verifier::visitMDNode(const MDNode &MD, AreDebugLocsAllowed AllowLocs) {
}
}
+ // Check llvm.loop.estimated_trip_count.
+ if (MD.getNumOperands() > 0 &&
+ MD.getOperand(0).equalsStr(LLVMLoopEstimatedTripCount)) {
+ Check(MD.getNumOperands() == 1 || MD.getNumOperands() == 2,
+ "Expected one or two operands", &MD);
+ if (MD.getNumOperands() == 2) {
+ auto *Count = dyn_cast_or_null<ConstantAsMetadata>(MD.getOperand(1));
+ Check(Count && Count->getType()->isIntegerTy() &&
+ cast<IntegerType>(Count->getType())->getBitWidth() <= 32,
+ "Expected optional second operand to be an integer constant of "
+ "type i32 or smaller",
+ &MD);
+ }
+ }
+
// Check these last, so we diagnose problems in operands first.
Check(!MD.isTemporary(), "Expected no forward declarations!", &MD);
Check(MD.isResolved(), "All nodes should be resolved!", &MD);
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 4894d6129072a..b4aa9a6758dd1 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -54,8 +54,6 @@ using namespace llvm::PatternMatch;
static const char *LLVMLoopDisableNonforced = "llvm.loop.disable_nonforced";
static const char *LLVMLoopDisableLICM = "llvm.licm.disable";
-static const char *LLVMLoopEstimatedTripCount =
- "llvm.loop.estimated_trip_count";
bool llvm::formDedicatedExitBlocks(Loop *L, DominatorTree *DT, LoopInfo *LI,
MemorySSAUpdater *MSSAU,
diff --git a/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll b/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll
new file mode 100644
index 0000000000000..0c44d960be244
--- /dev/null
+++ b/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll
@@ -0,0 +1,58 @@
+; Test "llvm.loop.estimated_trip_count" validation
+
+; DEFINE: %{RUN} = opt -passes=verify %t -disable-output 2>&1 | \
+; DEFINE: FileCheck %s -allow-empty -check-prefix
+
+define void @test() {
+entry:
+ br label %body
+body:
+ br i1 0, label %body, label %exit, !llvm.loop !0
+exit:
+ ret void
+}
+!0 = distinct !{!0, !1}
+
+; GOOD-NOT: {{.}}
+
+; BAD-VALUE: Expected optional second operand to be an integer constant of type i32 or smaller
+; BAD-VALUE-NEXT: !1 = !{!"llvm.loop.estimated_trip_count",
+
+; TOO-MANY: Expected one or two operands
+; TOO-MANY-NEXT: !1 = !{!"llvm.loop.estimated_trip_count", i32 5, i32 5}
+
+; No value.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count"}' >> %t
+; RUN: %{RUN} GOOD
+
+; i16 value.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count", i16 5}' >> %t
+; RUN: %{RUN} GOOD
+
+; i32 value.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count", i32 5}' >> %t
+; RUN: %{RUN} GOOD
+
+; i64 value.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count", i64 5}' >> %t
+; RUN: not %{RUN} BAD-VALUE
+
+; MDString value.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count", !"5"}' >> %t
+; RUN: not %{RUN} BAD-VALUE
+
+; MDNode value.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count", !2}' >> %t
+; RUN: echo '!2 = !{i32 5}' >> %t
+; RUN: not %{RUN} BAD-VALUE
+
+; Too many values.
+; RUN: cp %s %t
+; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count", i32 5, i32 5}' >> %t
+; RUN: not %{RUN} TOO-MANY
>From 0f40efd567a94f4a64047b8a06ae51f831aec82a Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 24 Jul 2025 11:36:29 -0400
Subject: [PATCH 14/27] Fix test for some builds
Somehow, on some of my builds, `llvm::` prefixes are dropped from some
symbol names in the printed past list.
---
llvm/test/Other/new-pm-thinlto-postlink-defaults.ll | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 355b80c54a174..b1f3262079a0d 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -77,7 +77,7 @@
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Running pass: RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module>
+; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis, {{.*}}Module>
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
>From 6148922ff7cd9ead73ce42988828e01f4adca794 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 24 Jul 2025 15:46:32 -0400
Subject: [PATCH 15/27] Apply some small reviewer suggestions
---
llvm/include/llvm/Transforms/Utils/LoopUtils.h | 8 ++++++++
llvm/lib/Transforms/Utils/LoopUtils.cpp | 6 +++---
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 73394bf1380f0..673996002f0a0 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -355,6 +355,14 @@ LLVM_ABI bool addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
/// TODO: Eventually, once all passes have migrated away from setting branch
/// weights to indicate estimated trip counts, this function will drop the
/// \p EstimatedLoopInvocationWeight parameter.
+///
+/// TODO: There are also passes that currently do not consider estimated trip
+/// counts at all but that, for example, affect whether trip counts can be
+/// estimated from branch weights. Once all such passes have been adjusted to
+/// update this metadata, this function might stop estimating trip counts from
+/// branch weights and instead simply get the \c llvm.loop_estimated_trip_count
+/// metadata. See also the \c llvm.loop.estimated_trip_count entry in
+/// \c LangRef.rst.
LLVM_ABI std::optional<unsigned>
getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight = nullptr,
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index b4aa9a6758dd1..9043baac7be8e 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -226,7 +226,7 @@ bool llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
if (NumOps == 1 || NumOps == 2) {
MDString *S = dyn_cast<MDString>(Node->getOperand(0));
if (S && S->getString() == StringMD) {
- // If it is already in place, do nothing.
+ // If the metadata and any value are already as specified, do nothing.
if (NumOps == 2 && V) {
ConstantInt *IntMD =
mdconst::extract_or_null<ConstantInt>(Node->getOperand(1));
@@ -879,7 +879,7 @@ std::optional<unsigned> llvm::getLoopEstimatedTripCount(
// EstimatedLoopInvocationWeight parameter.
if (EstimatedLoopInvocationWeight) {
if (BranchInst *ExitingBranch = getExpectedExitLoopLatchBranch(L)) {
- uint64_t LoopWeight, ExitWeight;
+ uint64_t LoopWeight = 0, ExitWeight = 0; // Inits expected to be unused.
if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight))
return std::nullopt;
if (L->contains(ExitingBranch->getSuccessor(1)))
@@ -951,7 +951,7 @@ bool llvm::setLoopEstimatedTripCount(
unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;
- if (*EstimatedTripCount > 0) {
+ if (*EstimatedTripCount != 0) {
LatchExitWeight = *EstimatedloopInvocationWeight;
BackedgeTakenWeight = (*EstimatedTripCount - 1) * LatchExitWeight;
}
>From e5a0a26ce8f2fab0f57fb69446bcea83bb6ec5fe Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 24 Jul 2025 17:41:25 -0400
Subject: [PATCH 16/27] Update for merge from pgo-estimated-trip-count
That's PR #128785, which is now a parent PR.
First, remove a todo that's now documented more generally than
LoopPeel in plenty of other places. Second, update LoopPeel's
setLoopEstimatedTripCount call to avoid a now redundant argument that
eventually won't be supported.
---
llvm/lib/Transforms/Utils/LoopPeel.cpp | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 09c5a08639a0f..ca2fd45a1ccc9 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -1221,15 +1221,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
// analysis later needs to determine the trip count of the remaining loop
// while examining it in isolation without considering the probability of
// actually reaching it, we store the new trip count as separate metadata.
- //
- // TODO: getLoopEstimatedTripCount and setLoopEstimatedTripCount skip loops
- // that don't match the restrictions of getExpectedExitLoopLatchBranch in
- // LoopUtils.cpp. For example,
- // llvm/tests/Transforms/LoopUnroll/peel-branch-weights.ll (introduced by
- // b43a4d0850d5) has multiple exits. Should we try to extend them to handle
- // such cases? For now, we just don't try to record
- // llvm.loop.estimated_trip_count for such cases, so the original branch
- // weights will have to do.
if (auto EstimatedTripCount = getLoopEstimatedTripCount(L)) {
// FIXME: The previous updateBranchWeights implementation had this
// comment:
@@ -1250,8 +1241,7 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
EstimatedTripCountNew = 0; // FIXME: = 2?
else
EstimatedTripCountNew -= TotalPeeled;
- setLoopEstimatedTripCount(L, EstimatedTripCountNew,
- /*EstimatedLoopInvocationWeight=*/std::nullopt);
+ setLoopEstimatedTripCount(L, EstimatedTripCountNew);
}
if (Loop *ParentLoop = L->getParentLoop())
>From 3a49b4395bb5880252af26ddeb8e173476398628 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 24 Jul 2025 17:59:04 -0400
Subject: [PATCH 17/27] Attempt to fix windows pre-commit CI
---
llvm/test/Other/new-pm-thinlto-postlink-defaults.ll | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index b1f3262079a0d..5b20e63d26eae 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -77,7 +77,7 @@
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
-; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis, {{.*}}Module>
+; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
>From f1fa8d9e7cf4a96b9816e157fb136929b8de2466 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 7 Aug 2025 14:02:00 -0400
Subject: [PATCH 18/27] Run update script on new test from last merge
For estimated trip count metdata.
---
.../PhaseOrdering/AArch64/interleave_vec.ll | 40 ++++++++++---------
1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll
index bb6f3e719bb14..49d2c231aaf72 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll
@@ -141,7 +141,7 @@ define void @same_op2_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <8 x float> [[INTERLEAVED_VEC22]], ptr [[TMP7]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 576
-; CHECK-NEXT: br i1 [[TMP10]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP10]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -235,7 +235,7 @@ define void @same_op3(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <12 x float> [[INTERLEAVED_VEC]], ptr [[TMP2]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 384
-; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -347,7 +347,7 @@ define void @same_op3_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <12 x float> [[INTERLEAVED_VEC]], ptr [[TMP3]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 384
-; CHECK-NEXT: br i1 [[TMP12]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP12]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -441,7 +441,7 @@ define void @same_op4(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <16 x float> [[INTERLEAVED_VEC]], ptr [[TMP2]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 288
-; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -540,7 +540,7 @@ define void @same_op4_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <16 x float> [[INTERLEAVED_VEC]], ptr [[TMP3]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 288
-; CHECK-NEXT: br i1 [[TMP5]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP5]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -643,7 +643,7 @@ define void @same_op6(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[ARRAYIDX9_4]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 6
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i64 [[INDVARS_IV]], 1146
-; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]]
+; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -751,7 +751,7 @@ define void @same_op6_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <2 x float> [[TMP13]], ptr [[ARRAYIDX7_4]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 6
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i64 [[INDVARS_IV]], 1146
-; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END11:.*]]
+; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END11:.*]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -854,7 +854,7 @@ define void @same_op8(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[ARRAYIDX9_4]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 8
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i64 [[INDVARS_IV]], 1144
-; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]]
+; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]], !llvm.loop [[LOOP11:![0-9]+]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -989,7 +989,7 @@ define void @same_op8_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <16 x float> [[INTERLEAVED_VEC]], ptr [[TMP6]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], 144
-; CHECK-NEXT: br i1 [[TMP25]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP25]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -1063,13 +1063,17 @@ for.end11: ; preds = %for.cond
ret void
}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]]}
-; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META2]]}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META1]]}
+; CHECK: [[LOOP10]] = distinct !{[[LOOP10]], [[META1]]}
+; CHECK: [[LOOP11]] = distinct !{[[LOOP11]], [[META1]]}
+; CHECK: [[LOOP12]] = distinct !{[[LOOP12]], [[META1]], [[META2]], [[META3]]}
;.
>From 38ace1e58cdf1b89eb567bad6ae6e52f8ea2651f Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 7 Aug 2025 17:08:59 -0400
Subject: [PATCH 19/27] Reapply 3a18fe33f0763cd9276c99c276448412100f6270
That was applied after merging this PR last time.
---
llvm/lib/Transforms/Utils/LoopUtils.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 9043baac7be8e..6327b92922c12 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -815,11 +815,14 @@ struct DbgLoop {
const Loop *L;
explicit DbgLoop(const Loop *L) : L(L) {}
};
+
+#ifndef NDEBUG
static inline raw_ostream &operator<<(raw_ostream &OS, DbgLoop D) {
OS << "function ";
D.L->getHeader()->getParent()->printAsOperand(OS, /*PrintType=*/false);
return OS << " " << *D.L;
}
+#endif // NDEBUG
static std::optional<unsigned> estimateLoopTripCount(Loop *L) {
// Currently we take the estimate exit count only from the loop latch,
>From 92ddaa0c0dc8d077a46aa53c5d1a763bf4d8a7a0 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 7 Aug 2025 20:18:55 -0400
Subject: [PATCH 20/27] Convert to function pass, avoid needless pass
invalidation
Suggested at
<https://github.com/llvm/llvm-project/pull/148758#discussion_r2246193446>.
---
.../Instrumentation/PGOEstimateTripCounts.h | 2 +-
llvm/lib/Passes/PassBuilderPipelines.cpp | 5 ++--
llvm/lib/Passes/PassRegistry.def | 2 +-
.../Instrumentation/PGOEstimateTripCounts.cpp | 29 ++++++++++---------
llvm/test/Other/new-pm-defaults.ll | 16 ++--------
.../Other/new-pm-thinlto-postlink-defaults.ll | 17 ++---------
.../new-pm-thinlto-postlink-pgo-defaults.ll | 18 ++----------
...-pm-thinlto-postlink-samplepgo-defaults.ll | 20 ++-----------
.../Other/new-pm-thinlto-prelink-defaults.ll | 16 ++--------
.../new-pm-thinlto-prelink-pgo-defaults.ll | 5 ----
...w-pm-thinlto-prelink-samplepgo-defaults.ll | 20 ++-----------
11 files changed, 32 insertions(+), 118 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
index 1b35c1c77e5c3..529e79c31e12b 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
@@ -16,7 +16,7 @@ namespace llvm {
struct PGOEstimateTripCountsPass
: public PassInfoMixin<PGOEstimateTripCountsPass> {
PGOEstimateTripCountsPass() {}
- PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+ PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
};
} // namespace llvm
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index d3f741d09d724..462b70b748918 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1240,7 +1240,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(AssignGUIDPass());
if (IsCtxProfUse) {
MPM.addPass(PGOCtxProfFlatteningPass(/*IsPreThinlink=*/true));
- MPM.addPass(PGOEstimateTripCountsPass());
+ MPM.addPass(createModuleToFunctionPassAdaptor(
+ PGOEstimateTripCountsPass()));
return MPM;
}
// Block further inlining in the instrumented ctxprof case. This avoids
@@ -1273,7 +1274,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
PGOOpt->Action == PGOOptions::SampleUse)) {
MPM.addPass(PGOForceFunctionAttrsPass(PGOOpt->ColdOptType));
}
- MPM.addPass(PGOEstimateTripCountsPass());
+ MPM.addPass(createModuleToFunctionPassAdaptor(PGOEstimateTripCountsPass()));
MPM.addPass(AlwaysInlinerPass(/*InsertLifetimeIntrinsics=*/true));
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index f830b8ce2aa11..ab869b5ffb4b4 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -124,7 +124,6 @@ MODULE_PASS("openmp-opt", OpenMPOptPass())
MODULE_PASS("openmp-opt-postlink",
OpenMPOptPass(ThinOrFullLTOPhase::FullLTOPostLink))
MODULE_PASS("partial-inliner", PartialInlinerPass())
-MODULE_PASS("pgo-estimate-trip-counts", PGOEstimateTripCountsPass())
MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())
MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())
MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())
@@ -488,6 +487,7 @@ FUNCTION_PASS("objc-arc-contract", ObjCARCContractPass())
FUNCTION_PASS("objc-arc-expand", ObjCARCExpandPass())
FUNCTION_PASS("pa-eval", PAEvalPass())
FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass())
+FUNCTION_PASS("pgo-estimate-trip-counts", PGOEstimateTripCountsPass())
FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())
FUNCTION_PASS("place-safepoints", PlaceSafepointsPass())
FUNCTION_PASS("print", PrintFunctionPass(errs()))
diff --git a/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
index 762aca0b897ce..7e6eb52dcaebe 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
+#include "llvm/Analysis/LazyCallGraph.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/Module.h"
#include "llvm/Transforms/Utils/LoopUtils.h"
@@ -25,21 +26,21 @@ static bool runOnLoop(Loop *L) {
return MadeChange;
}
-PreservedAnalyses PGOEstimateTripCountsPass::run(Module &M,
- ModuleAnalysisManager &AM) {
- FunctionAnalysisManager &FAM =
- AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
+PreservedAnalyses PGOEstimateTripCountsPass::run(Function &F,
+ FunctionAnalysisManager &FAM) {
bool MadeChange = false;
LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": start\n");
- for (Function &F : M) {
- if (F.isDeclaration())
- continue;
- LoopInfo *LI = &FAM.getResult<LoopAnalysis>(F);
- if (!LI)
- continue;
- for (Loop *L : *LI)
- MadeChange |= runOnLoop(L);
- }
+ LoopInfo *LI = &FAM.getResult<LoopAnalysis>(F);
+ if (!LI)
+ return PreservedAnalyses::all();
+ for (Loop *L : *LI)
+ MadeChange |= runOnLoop(L);
LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": end\n");
- return MadeChange ? PreservedAnalyses::none() : PreservedAnalyses::all();
+ if (MadeChange) {
+ PreservedAnalyses PA;
+ PA.preserveSet<CFGAnalyses>();
+ PA.preserve<LazyCallGraphAnalysis>();
+ return PA;
+ }
+ return PreservedAnalyses::all();
}
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index afa31015288ae..1b8ae9cbc2dc8 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -136,9 +136,8 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -146,31 +145,23 @@
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
+; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-EP-CGSCC-LATE-NEXT: Running pass: NoOpCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
-; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
-; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O-NEXT: Running analysis: BasicAA
-; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA
-; CHECK-O-NEXT: Running analysis: TypeBasedAA
-; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -189,10 +180,8 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: ReassociatePass
; CHECK-O23SZ-NEXT: Running pass: ConstraintEliminationPass
-; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis
; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running pass: LoopSimplifyPass
-; CHECK-O1-NEXT: Running analysis: LoopAnalysis
; CHECK-O-NEXT: Running pass: LCSSAPass
; CHECK-O1-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
@@ -254,7 +243,6 @@
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-DEFAULT-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-LTO-NOT: Running pass: EliminateAvailableExternallyPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 5b20e63d26eae..9a4eb0b1a7499 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -67,40 +67,30 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
+; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
-; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
-; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O-NEXT: Running analysis: BasicAA on foo
-; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA on foo
-; CHECK-O-NEXT: Running analysis: TypeBasedAA on foo
-; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -117,10 +107,8 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: ReassociatePass
; CHECK-O23SZ-NEXT: Running pass: ConstraintEliminationPass
-; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis
; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running pass: LoopSimplifyPass
-; CHECK-O1-NEXT: Running analysis: LoopAnalysis
; CHECK-O-NEXT: Running pass: LCSSAPass
; CHECK-O1-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
@@ -178,7 +166,6 @@
; CHECK-O-NEXT: Running pass: SimplifyTypeTestsPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-POSTLINK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-POSTLINK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-POSTLINK-O-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-POSTLINK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 8328e4e69b0b9..80902350829cf 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -52,39 +52,30 @@
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
+; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
-; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
-; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O-NEXT: Running analysis: BasicAA
-; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA
-; CHECK-O-NEXT: Running analysis: TypeBasedAA
-; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -93,10 +84,6 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
-; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
-; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
-; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -162,7 +149,6 @@
; CHECK-O-NEXT: Running pass: SimplifyTypeTestsPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index 0a1e152932da8..6113c1f6d3657 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -61,40 +61,29 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
-; CHECK-O-NEXT: Invalidating analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
+; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
-; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
-; CHECK-O-NEXT: Running analysis: AssumptionAnalysis on foo
-; CHECK-O-NEXT: Running analysis: TargetIRAnalysis on foo
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O-NEXT: Running analysis: BasicAA on foo
-; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA on foo
-; CHECK-O-NEXT: Running analysis: TypeBasedAA on foo
-; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -103,10 +92,6 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis on foo
-; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
-; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -172,7 +157,6 @@
; CHECK-O-NEXT: Running pass: SimplifyTypeTestsPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on bar
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index a2248bc4d7f4f..2d1348e84b09e 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -99,9 +99,8 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
+; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -109,30 +108,22 @@
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
+; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass on (foo)
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
-; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
-; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O-NEXT: Running analysis: BasicAA
-; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA
-; CHECK-O-NEXT: Running analysis: TypeBasedAA
-; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -149,10 +140,8 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: ReassociatePass
; CHECK-O23SZ-NEXT: Running pass: ConstraintEliminationPass
-; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis
; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running pass: LoopSimplifyPass
-; CHECK-O1-NEXT: Running analysis: LoopAnalysis
; CHECK-O-NEXT: Running pass: LCSSAPass
; CHECK-O1-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
@@ -206,7 +195,6 @@
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-EXT: Running pass: {{.*}}::Bye
; CHECK-EP-OPT-EARLY-NEXT: Running pass: NoOpModulePass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index e26e1eafbfb65..cefdad7eaf4f7 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -89,9 +89,7 @@
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
@@ -106,13 +104,11 @@
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
@@ -132,7 +128,6 @@
; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index e2db335d39397..49a81cd41a24d 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -67,40 +67,29 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
-; CHECK-O-NEXT: Invalidating analysis: LazyCallGraphAnalysis
+; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
-; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
; CHECK-O-NEXT: Running analysis: GlobalsAA
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}AAManager
+; CHECK-O-NEXT: Invalidating analysis: AAManager
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
-; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis on [module]
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
; CHECK-O-NEXT: Running pass: InlinerPass
-; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
-; CHECK-O-NEXT: Running analysis: AssumptionAnalysis on foo
-; CHECK-O-NEXT: Running analysis: TargetIRAnalysis on foo
; CHECK-O-NEXT: Running pass: EarlyCSEPass
; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
; CHECK-O-NEXT: Running analysis: AAManager
-; CHECK-O-NEXT: Running analysis: BasicAA on foo
-; CHECK-O-NEXT: Running analysis: ScopedNoAliasAA on foo
-; CHECK-O-NEXT: Running analysis: TypeBasedAA on foo
-; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
@@ -109,10 +98,6 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis on foo
-; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
-; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis on foo
-; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -173,7 +158,6 @@
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on bar
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: AnnotationRemarksPass on foo
; CHECK-O-NEXT: Running pass: CanonicalizeAliasesPass
>From a3e0d72570ba34568b2bd1bb5fe3afc2600679d6 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 7 Aug 2025 21:22:15 -0400
Subject: [PATCH 21/27] Fix layering violation
Suggested at
<https://github.com/llvm/llvm-project/pull/148758#issuecomment-3141478969>.
---
llvm/include/llvm/IR/ProfDataUtils.h | 4 ++++
llvm/include/llvm/Transforms/Utils/LoopUtils.h | 2 --
llvm/lib/IR/ProfDataUtils.cpp | 1 +
llvm/lib/IR/Verifier.cpp | 1 -
4 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/llvm/include/llvm/IR/ProfDataUtils.h b/llvm/include/llvm/IR/ProfDataUtils.h
index ca56e4aa81575..4f5b661a0e0cc 100644
--- a/llvm/include/llvm/IR/ProfDataUtils.h
+++ b/llvm/include/llvm/IR/ProfDataUtils.h
@@ -30,6 +30,10 @@ struct MDProfLabels {
LLVM_ABI static const char *UnknownBranchWeightsMarker;
};
+/// Profile-based loop metadata that should be accessed only by using
+/// \c llvm::getLoopEstimatedTripCount and \c llvm::setLoopEstimatedTripCount.
+LLVM_ABI extern const char *LLVMLoopEstimatedTripCount;
+
/// Checks if an Instruction has MD_prof Metadata
LLVM_ABI bool hasProfMD(const Instruction &I);
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 673996002f0a0..8bb4c1271adb9 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -52,8 +52,6 @@ typedef std::pair<const RuntimeCheckingPtrGroup *,
template <typename T, unsigned N> class SmallSetVector;
template <typename T, unsigned N> class SmallPriorityWorklist;
-const char *const LLVMLoopEstimatedTripCount = "llvm.loop.estimated_trip_count";
-
LLVM_ABI BasicBlock *InsertPreheaderForLoop(Loop *L, DominatorTree *DT,
LoopInfo *LI,
MemorySSAUpdater *MSSAU,
diff --git a/llvm/lib/IR/ProfDataUtils.cpp b/llvm/lib/IR/ProfDataUtils.cpp
index b1b5f67689e6d..a40cdaafbd567 100644
--- a/llvm/lib/IR/ProfDataUtils.cpp
+++ b/llvm/lib/IR/ProfDataUtils.cpp
@@ -95,6 +95,7 @@ const char *MDProfLabels::FunctionEntryCount = "function_entry_count";
const char *MDProfLabels::SyntheticFunctionEntryCount =
"synthetic_function_entry_count";
const char *MDProfLabels::UnknownBranchWeightsMarker = "unknown";
+const char *LLVMLoopEstimatedTripCount = "llvm.loop.estimated_trip_count";
bool hasProfMD(const Instruction &I) {
return I.hasMetadata(LLVMContext::MD_prof);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index ba52cbffbcb01..e65493b15c3e6 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -121,7 +121,6 @@
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/ModRef.h"
#include "llvm/Support/raw_ostream.h"
-#include "llvm/Transforms/Utils/LoopUtils.h"
#include <algorithm>
#include <cassert>
#include <cstdint>
>From 67f22cda70c379df3a35d6188d06ad5bac39d380 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Fri, 8 Aug 2025 14:46:15 -0400
Subject: [PATCH 22/27] Apply clang-format
---
llvm/lib/Passes/PassBuilderPipelines.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 462b70b748918..ae1e2bcca5246 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1240,8 +1240,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(AssignGUIDPass());
if (IsCtxProfUse) {
MPM.addPass(PGOCtxProfFlatteningPass(/*IsPreThinlink=*/true));
- MPM.addPass(createModuleToFunctionPassAdaptor(
- PGOEstimateTripCountsPass()));
+ MPM.addPass(
+ createModuleToFunctionPassAdaptor(PGOEstimateTripCountsPass()));
return MPM;
}
// Block further inlining in the instrumented ctxprof case. This avoids
>From 680bdc22b40e3249ef5a04c6fb22efab4c2addc9 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 18 Aug 2025 19:10:48 -0400
Subject: [PATCH 23/27] Remove PGOEstimateTripCountsPass and no-value form of
metadata
---
llvm/docs/LangRef.rst | 53 ++--
llvm/include/llvm/Analysis/LoopInfo.h | 10 +-
.../Instrumentation/PGOEstimateTripCounts.h | 24 --
.../include/llvm/Transforms/Utils/LoopUtils.h | 59 ++---
llvm/lib/Analysis/LoopInfo.cpp | 10 +-
llvm/lib/IR/Verifier.cpp | 17 +-
llvm/lib/Passes/PassBuilder.cpp | 1 -
llvm/lib/Passes/PassBuilderPipelines.cpp | 9 +-
llvm/lib/Passes/PassRegistry.def | 1 -
.../Transforms/Instrumentation/CMakeLists.txt | 1 -
.../Instrumentation/PGOEstimateTripCounts.cpp | 46 ----
llvm/lib/Transforms/Utils/LoopUtils.cpp | 91 +++----
llvm/test/Other/new-pm-defaults.ll | 9 +-
.../Other/new-pm-thinlto-postlink-defaults.ll | 10 +-
.../new-pm-thinlto-postlink-pgo-defaults.ll | 3 -
...-pm-thinlto-postlink-samplepgo-defaults.ll | 4 +-
.../Other/new-pm-thinlto-prelink-defaults.ll | 9 +-
.../new-pm-thinlto-prelink-pgo-defaults.ll | 5 +-
...w-pm-thinlto-prelink-samplepgo-defaults.ll | 3 -
.../Transforms/LoopUnroll/unroll-cleanup.ll | 12 +-
.../LoopVectorize/X86/already-vectorized.ll | 5 +-
.../LoopVectorize/vect.omp.persistence.ll | 3 +-
.../PGOProfile/pgo-estimate-trip-counts.ll | 240 ------------------
.../AArch64/block_scaling_decompr_8bit.ll | 9 +-
.../AArch64/extra-unroll-simplifications.ll | 9 +-
.../AArch64/hoist-runtime-checks.ll | 25 +-
.../AArch64/indvars-vectorization.ll | 11 +-
.../PhaseOrdering/AArch64/interleave_vec.ll | 40 ++-
.../AArch64/predicated-reduction.ll | 26 +-
.../Transforms/PhaseOrdering/X86/pr88239.ll | 7 +-
.../X86/preserve-access-group.ll | 11 +-
.../PhaseOrdering/branch-dom-cond.ll | 6 +-
.../llvm.loop.estimated_trip_count.ll | 9 +-
33 files changed, 169 insertions(+), 609 deletions(-)
delete mode 100644 llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
delete mode 100644 llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
delete mode 100644 llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 6d366b1cd3a31..1f2f7349099ac 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7971,13 +7971,12 @@ loop distribution pass. See
This metadata records an estimated trip count for the loop. The first operand
is the string ``llvm.loop.estimated_trip_count``. The second operand is an
-integer constant of type ``i32`` or smaller specifying the count, which might be
-omitted for the reasons described below. For example:
+integer constant of type ``i32`` or smaller specifying the estimate. For
+example:
.. code-block:: llvm
!0 = !{!"llvm.loop.estimated_trip_count", i32 8}
- !1 = !{!"llvm.loop.estimated_trip_count"}
Purpose
"""""""
@@ -7994,38 +7993,26 @@ determine how many iterations to peel or how aggressively to unroll.
Initialization and Maintenance
""""""""""""""""""""""""""""""
-The ``pgo-estimate-trip-counts`` pass typically runs immediately after profile
-ingestion to add this metadata to all loops. It estimates each loop's trip
-count from the loop's ``branch_weights`` metadata. This way of initially
-estimating trip counts appears to be useful for the passes that consume them.
-
-As passes transform existing loops and create new loops, they must be free to
-update and create ``branch_weights`` metadata to maintain accurate block
-frequencies. Trip counts estimated from this new ``branch_weights`` metadata
-are not necessarily useful to the passes that consume them. In general, when
-passes transform and create loops, they should separately estimate new trip
-counts from previously estimated trip counts, and they should record them by
-creating or updating this metadata. For this or any other work involving
-estimated trip counts, passes should always call
+Passes should interact with estimated trip counts always via
``llvm::getLoopEstimatedTripCount`` and ``llvm::setLoopEstimatedTripCount``.
-Missing Metadata and Values
-"""""""""""""""""""""""""""
-
-If the current implementation of ``pgo-estimate-trip-counts`` cannot estimate a
-trip count from the loop's ``branch_weights`` metadata due to the loop's form or
-due to missing profile data, it creates this metadata for the loop but omits the
-value. This situation is currently common (e.g., the LLVM IR loop that Clang
-emits for a simple C ``for`` loop). A later pass (e.g., ``loop-rotate``) might
-modify the loop's form in a way that enables estimating its trip count even if
-those modifications provably never impact the actual number of loop iterations.
-That later pass should then add an appropriate value to the metadata.
-
-However, not all such passes currently do so. Thus, if this metadata has no
-value, ``llvm::getLoopEstimatedTripCount`` will disregard it and estimate the
-trip count from the loop's ``branch_weights`` metadata. It does the same when
-the metadata is missing altogether, perhaps because ``pgo-estimate-trip-counts``
-was not specified in a minimal pass list to a tool like ``opt``.
+When the ``llvm.loop.estimated_trip_count`` metadata is not present on a loop,
+``llvm::getLoopEstimatedTripCount`` estimates the loop's trip count from the
+loop's ``branch_weights`` metadata under the assumption that the latter still
+accurately encodes the program's original profile data. However, as passes
+transform existing loops and create new loops, they must be free to update and
+create ``branch_weights`` metadata in a way that maintains accurate block
+frequencies. Trip counts estimated from this new ``branch_weights`` metadata
+are not necessarily useful to the passes that consume estimated trip counts.
+
+For this reason, when a pass transforms or creates loops, the pass should
+separately estimate new trip counts based on the estimated trip counts that
+``llvm::getLoopEstimatedTripCount`` returns at the start of the pass, and the
+pass should record the new estimates by calling
+``llvm::setLoopEstimatedTripCount``, which creates or updates
+``llvm.loop.estimated_trip_count`` metadata. Once this metadata is present on a
+loop, ``llvm::getLoopEstimatedTripCount`` returns its value instead of
+estimating the trip count from the loop's ``branch_weights`` metadata.
'``llvm.licm.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Analysis/LoopInfo.h b/llvm/include/llvm/Analysis/LoopInfo.h
index 0212e1eae61ff..f80744e70f7ad 100644
--- a/llvm/include/llvm/Analysis/LoopInfo.h
+++ b/llvm/include/llvm/Analysis/LoopInfo.h
@@ -638,13 +638,9 @@ LLVM_ABI std::optional<bool> getOptionalBoolLoopAttribute(const Loop *TheLoop,
/// Returns true if Name is applied to TheLoop and enabled.
LLVM_ABI bool getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name);
-/// Find named metadata for a loop with an integer value. Return
-/// \c std::nullopt if the metadata has no value or is missing altogether. If
-/// \p Missing, set \c *Missing to indicate whether the metadata is missing
-/// altogether.
-LLVM_ABI std::optional<int>
-getOptionalIntLoopAttribute(const Loop *TheLoop, StringRef Name,
- bool *Missing = nullptr);
+/// Find named metadata for a loop with an integer value.
+LLVM_ABI std::optional<int> getOptionalIntLoopAttribute(const Loop *TheLoop,
+ StringRef Name);
/// Find named metadata for a loop with an integer value. Return \p Default if
/// not set.
diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
deleted file mode 100644
index 529e79c31e12b..0000000000000
--- a/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
+++ /dev/null
@@ -1,24 +0,0 @@
-//===- PGOEstimateTripCounts.h ----------------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
-#define LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
-
-#include "llvm/IR/PassManager.h"
-
-namespace llvm {
-
-struct PGOEstimateTripCountsPass
- : public PassInfoMixin<PGOEstimateTripCountsPass> {
- PGOEstimateTripCountsPass() {}
- PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
-};
-
-} // namespace llvm
-
-#endif // LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index e1b071faae3d9..0ee22a24e56c5 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -317,17 +317,15 @@ LLVM_ABI TransformationMode hasDistributeTransformation(const Loop *L);
LLVM_ABI TransformationMode hasLICMVersioningTransformation(const Loop *L);
/// @}
-/// Set the string \p MDString into the loop metadata of \p TheLoop while
-/// keeping other loop metadata intact. Set \p *V as its value, or set it
-/// without a value if \p V is \c std::nullopt to indicate the value is unknown.
-/// If \p MDString is already in the loop metadata, update it if its value (or
-/// lack of value) is different. Return true if metadata was changed.
-LLVM_ABI bool addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
- std::optional<unsigned> V = 0);
+/// Set input string into loop metadata by keeping other values intact.
+/// If the string is already in loop metadata update value if it is
+/// different.
+LLVM_ABI void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
+ unsigned V = 0);
/// Return either:
/// - The value of \c llvm.loop.estimated_trip_count from the loop metadata of
-/// \p L, if that metadata is present and has a value.
+/// \p L, if that metadata is present.
/// - Else, a new estimate of the trip count from the latch branch weights of
/// \p L, if the estimation's implementation is able to handle the loop form
/// of \p L (e.g., \p L must have a latch block that controls the loop exit).
@@ -336,17 +334,10 @@ LLVM_ABI bool addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
/// An estimated trip count is always a valid positive trip count, saturated at
/// \c UINT_MAX.
///
-/// Via \c LLVM_DEBUG, emit diagnostics that include "WARNING" when the metadata
-/// is in an unexpected state as that indicates some transformation has
-/// corrupted it. If \p DbgForInit, expect the metadata to be missing.
-/// Otherwise, expect the metadata to be present, and expect it to have no value
-/// only if the trip count is currently inestimable from the latch branch
-/// weights.
-///
/// In addition, if \p EstimatedLoopInvocationWeight, then either:
-/// - Set \p *EstimatedLoopInvocationWeight to the weight of the latch's branch
+/// - Set \c *EstimatedLoopInvocationWeight to the weight of the latch's branch
/// to the loop exit.
-/// - Do not set it and return \c std::nullopt if the current implementation
+/// - Do not set it, and return \c std::nullopt, if the current implementation
/// cannot compute that weight (e.g., if \p L does not have a latch block that
/// controls the loop exit) or the weight is zero (because zero cannot be
/// used to compute new branch weights that reflect the estimated trip count).
@@ -354,43 +345,27 @@ LLVM_ABI bool addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
/// TODO: Eventually, once all passes have migrated away from setting branch
/// weights to indicate estimated trip counts, this function will drop the
/// \p EstimatedLoopInvocationWeight parameter.
-///
-/// TODO: There are also passes that currently do not consider estimated trip
-/// counts at all but that, for example, affect whether trip counts can be
-/// estimated from branch weights. Once all such passes have been adjusted to
-/// update this metadata, this function might stop estimating trip counts from
-/// branch weights and instead simply get the \c llvm.loop_estimated_trip_count
-/// metadata. See also the \c llvm.loop.estimated_trip_count entry in
-/// \c LangRef.rst.
LLVM_ABI std::optional<unsigned>
getLoopEstimatedTripCount(Loop *L,
- unsigned *EstimatedLoopInvocationWeight = nullptr,
- bool DbgForInit = false);
-
-/// Set \c llvm.loop.estimated_trip_count with the value \c *EstimatedTripCount
-/// in the loop metadata of \p L, or set it without a value if
-/// \c !EstimatedTripCount to indicate that \c getLoopEstimatedTripCount cannot
-/// estimate the trip count from latch branch weights. If
-/// \c !EstimatedTripCount but \c getLoopEstimatedTripCount can estimate the
-/// trip counts, future calls to \c getLoopEstimatedTripCount will diagnose the
-/// metadata as corrupt.
+ unsigned *EstimatedLoopInvocationWeight = nullptr);
+
+/// Set \c llvm.loop.estimated_trip_count with the value \p EstimatedTripCount
+/// in the loop metadata of \p L.
///
/// In addition, if \p EstimatedLoopInvocationWeight, set the branch weight
/// metadata of \p L to reflect that \p L has an estimated
-/// \c *EstimatedTripCount iterations and has \c *EstimatedLoopInvocationWeight
+/// \p EstimatedTripCount iterations and has \c *EstimatedLoopInvocationWeight
/// exit weight through the loop's latch.
///
-/// Return false if \c llvm.loop.estimated_trip_count was already set according
-/// to \p EstimatedTripCount and so was not updated. Return false if
-/// \p EstimatedLoopInvocationWeight and if branch weight metadata could not be
-/// successfully updated (e.g., if \p L does not have a latch block that
-/// controls the loop exit). Otherwise, return true.
+/// Return false if \p EstimatedLoopInvocationWeight and if branch weight
+/// metadata could not be successfully updated (e.g., if \p L does not have a
+/// latch block that controls the loop exit). Otherwise, return true.
///
/// TODO: Eventually, once all passes have migrated away from setting branch
/// weights to indicate estimated trip counts, this function will drop the
/// \p EstimatedLoopInvocationWeight parameter.
LLVM_ABI bool setLoopEstimatedTripCount(
- Loop *L, std::optional<unsigned> EstimatedTripCount,
+ Loop *L, unsigned EstimatedTripCount,
std::optional<unsigned> EstimatedLoopInvocationWeight = std::nullopt);
/// Check inner loop (L) backedge count is known to be invariant on all
diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp
index 44b49beb14ee7..6ba6073cce950 100644
--- a/llvm/lib/Analysis/LoopInfo.cpp
+++ b/llvm/lib/Analysis/LoopInfo.cpp
@@ -1123,13 +1123,9 @@ bool llvm::getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name) {
}
std::optional<int> llvm::getOptionalIntLoopAttribute(const Loop *TheLoop,
- StringRef Name,
- bool *Missing) {
- std::optional<const MDOperand *> AttrMDOpt =
- findStringMetadataForLoop(TheLoop, Name);
- if (Missing)
- *Missing = !AttrMDOpt;
- const MDOperand *AttrMD = AttrMDOpt.value_or(nullptr);
+ StringRef Name) {
+ const MDOperand *AttrMD =
+ findStringMetadataForLoop(TheLoop, Name).value_or(nullptr);
if (!AttrMD)
return std::nullopt;
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 222ed843e1cf2..f1fad207661b1 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -1077,16 +1077,13 @@ void Verifier::visitMDNode(const MDNode &MD, AreDebugLocsAllowed AllowLocs) {
// Check llvm.loop.estimated_trip_count.
if (MD.getNumOperands() > 0 &&
MD.getOperand(0).equalsStr(LLVMLoopEstimatedTripCount)) {
- Check(MD.getNumOperands() == 1 || MD.getNumOperands() == 2,
- "Expected one or two operands", &MD);
- if (MD.getNumOperands() == 2) {
- auto *Count = dyn_cast_or_null<ConstantAsMetadata>(MD.getOperand(1));
- Check(Count && Count->getType()->isIntegerTy() &&
- cast<IntegerType>(Count->getType())->getBitWidth() <= 32,
- "Expected optional second operand to be an integer constant of "
- "type i32 or smaller",
- &MD);
- }
+ Check(MD.getNumOperands() == 2, "Expected two operands", &MD);
+ auto *Count = dyn_cast_or_null<ConstantAsMetadata>(MD.getOperand(1));
+ Check(Count && Count->getType()->isIntegerTy() &&
+ cast<IntegerType>(Count->getType())->getBitWidth() <= 32,
+ "Expected second operand to be an integer constant of type i32 or "
+ "smaller",
+ &MD);
}
// Check these last, so we diagnose problems in operands first.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cba3fb618f78..b7edeea082762 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -251,7 +251,6 @@
#include "llvm/Transforms/Instrumentation/NumericalStabilitySanitizer.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
-#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
#include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
#include "llvm/Transforms/Instrumentation/RealtimeSanitizer.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index ae1e2bcca5246..98821bb1408a7 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -80,7 +80,6 @@
#include "llvm/Transforms/Instrumentation/MemProfUse.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
#include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
-#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
#include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
#include "llvm/Transforms/Scalar/ADCE.h"
@@ -1240,8 +1239,6 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(AssignGUIDPass());
if (IsCtxProfUse) {
MPM.addPass(PGOCtxProfFlatteningPass(/*IsPreThinlink=*/true));
- MPM.addPass(
- createModuleToFunctionPassAdaptor(PGOEstimateTripCountsPass()));
return MPM;
}
// Block further inlining in the instrumented ctxprof case. This avoids
@@ -1271,10 +1268,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
MPM.addPass(MemProfUsePass(PGOOpt->MemoryProfile, PGOOpt->FS));
if (PGOOpt && (PGOOpt->Action == PGOOptions::IRUse ||
- PGOOpt->Action == PGOOptions::SampleUse)) {
+ PGOOpt->Action == PGOOptions::SampleUse))
MPM.addPass(PGOForceFunctionAttrsPass(PGOOpt->ColdOptType));
- }
- MPM.addPass(createModuleToFunctionPassAdaptor(PGOEstimateTripCountsPass()));
MPM.addPass(AlwaysInlinerPass(/*InsertLifetimeIntrinsics=*/true));
@@ -2360,4 +2355,4 @@ AAManager PassBuilder::buildDefaultAAPipeline() {
bool PassBuilder::isInstrumentedPGOUse() const {
return (PGOOpt && PGOOpt->Action == PGOOptions::IRUse) ||
!UseCtxProfile.empty();
-}
+}
\ No newline at end of file
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index ab869b5ffb4b4..1b111dc20d35c 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -487,7 +487,6 @@ FUNCTION_PASS("objc-arc-contract", ObjCARCContractPass())
FUNCTION_PASS("objc-arc-expand", ObjCARCExpandPass())
FUNCTION_PASS("pa-eval", PAEvalPass())
FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass())
-FUNCTION_PASS("pgo-estimate-trip-counts", PGOEstimateTripCountsPass())
FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())
FUNCTION_PASS("place-safepoints", PlaceSafepointsPass())
FUNCTION_PASS("print", PrintFunctionPass(errs()))
diff --git a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
index 0a97ed4b51e69..15fd421a41b0f 100644
--- a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
+++ b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
@@ -16,7 +16,6 @@ add_llvm_component_library(LLVMInstrumentation
LowerAllowCheckPass.cpp
PGOCtxProfFlattening.cpp
PGOCtxProfLowering.cpp
- PGOEstimateTripCounts.cpp
PGOForceFunctionAttrs.cpp
PGOInstrumentation.cpp
PGOMemOPSizeOpt.cpp
diff --git a/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
deleted file mode 100644
index 7e6eb52dcaebe..0000000000000
--- a/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
+++ /dev/null
@@ -1,46 +0,0 @@
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
-#include "llvm/Analysis/LazyCallGraph.h"
-#include "llvm/Analysis/LoopInfo.h"
-#include "llvm/IR/Module.h"
-#include "llvm/Transforms/Utils/LoopUtils.h"
-
-using namespace llvm;
-
-#define DEBUG_TYPE "pgo-estimate-trip-counts"
-
-static bool runOnLoop(Loop *L) {
- bool MadeChange = false;
- std::optional<unsigned> TC = getLoopEstimatedTripCount(
- L, /*EstimatedLoopInvocationWeight=*/nullptr, /*DbgForInit=*/true);
- MadeChange |= setLoopEstimatedTripCount(L, TC);
- for (Loop *SL : *L)
- MadeChange |= runOnLoop(SL);
- return MadeChange;
-}
-
-PreservedAnalyses PGOEstimateTripCountsPass::run(Function &F,
- FunctionAnalysisManager &FAM) {
- bool MadeChange = false;
- LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": start\n");
- LoopInfo *LI = &FAM.getResult<LoopAnalysis>(F);
- if (!LI)
- return PreservedAnalyses::all();
- for (Loop *L : *LI)
- MadeChange |= runOnLoop(L);
- LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": end\n");
- if (MadeChange) {
- PreservedAnalyses PA;
- PA.preserveSet<CFGAnalyses>();
- PA.preserve<LazyCallGraphAnalysis>();
- return PA;
- }
- return PreservedAnalyses::all();
-}
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 5f48ebff5f083..38d249a50b0a8 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -201,40 +201,34 @@ void llvm::initializeLoopPassPass(PassRegistry &Registry) {
}
/// Create MDNode for input string.
-static MDNode *createStringMetadata(Loop *TheLoop, StringRef Name,
- std::optional<unsigned> V) {
+static MDNode *createStringMetadata(Loop *TheLoop, StringRef Name, unsigned V) {
LLVMContext &Context = TheLoop->getHeader()->getContext();
- if (V) {
- Metadata *MDs[] = {MDString::get(Context, Name),
- ConstantAsMetadata::get(
- ConstantInt::get(Type::getInt32Ty(Context), *V))};
- return MDNode::get(Context, MDs);
- }
- return MDNode::get(Context, {MDString::get(Context, Name)});
+ Metadata *MDs[] = {
+ MDString::get(Context, Name),
+ ConstantAsMetadata::get(ConstantInt::get(Type::getInt32Ty(Context), V))};
+ return MDNode::get(Context, MDs);
}
-bool llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
- std::optional<unsigned> V) {
+/// Set input string into loop metadata by keeping other values intact.
+/// If the string is already in loop metadata update value if it is
+/// different.
+void llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
+ unsigned V) {
SmallVector<Metadata *, 4> MDs(1);
// If the loop already has metadata, retain it.
MDNode *LoopID = TheLoop->getLoopID();
if (LoopID) {
for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
MDNode *Node = cast<MDNode>(LoopID->getOperand(i));
- // If it is of form key [= value], try to parse it.
- unsigned NumOps = Node->getNumOperands();
- if (NumOps == 1 || NumOps == 2) {
+ // If it is of form key = value, try to parse it.
+ if (Node->getNumOperands() == 2) {
MDString *S = dyn_cast<MDString>(Node->getOperand(0));
if (S && S->getString() == StringMD) {
- // If the metadata and any value are already as specified, do nothing.
- if (NumOps == 2 && V) {
- ConstantInt *IntMD =
- mdconst::extract_or_null<ConstantInt>(Node->getOperand(1));
- if (IntMD && IntMD->getSExtValue() == *V)
- return false;
- } else if (NumOps == 1 && !V) {
- return false;
- }
+ ConstantInt *IntMD =
+ mdconst::extract_or_null<ConstantInt>(Node->getOperand(1));
+ if (IntMD && IntMD->getSExtValue() == V)
+ // It is already in place. Do nothing.
+ return;
// We need to update the value, so just skip it here and it will
// be added after copying other existed nodes.
continue;
@@ -251,7 +245,6 @@ bool llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);
TheLoop->setLoopID(NewLoopID);
- return true;
}
std::optional<ElementCount>
@@ -867,13 +860,14 @@ static std::optional<unsigned> estimateLoopTripCount(Loop *L) {
// Estimated trip count is one plus estimated exit count.
uint64_t TC = ExitCount + 1;
- LLVM_DEBUG(dbgs() << "estimateLoopTripCount: estimated trip count of " << TC
+ LLVM_DEBUG(dbgs() << "estimateLoopTripCount: Estimated trip count of " << TC
<< " for " << DbgLoop(L) << "\n");
return TC;
}
-std::optional<unsigned> llvm::getLoopEstimatedTripCount(
- Loop *L, unsigned *EstimatedLoopInvocationWeight, bool DbgForInit) {
+std::optional<unsigned>
+llvm::getLoopEstimatedTripCount(Loop *L,
+ unsigned *EstimatedLoopInvocationWeight) {
// If requested, either compute *EstimatedLoopInvocationWeight or return
// nullopt if cannot.
//
@@ -895,9 +889,7 @@ std::optional<unsigned> llvm::getLoopEstimatedTripCount(
// Return the estimated trip count from metadata unless the metadata is
// missing or has no value.
- bool Missing = false; // Initialization is expected to be unused.
- if (auto TC = getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount,
- &Missing)) {
+ if (auto TC = getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount)) {
LLVM_DEBUG(dbgs() << "getLoopEstimatedTripCount: "
<< LLVMLoopEstimatedTripCount << " metadata has trip "
<< "count of " << *TC << " for " << DbgLoop(L) << "\n");
@@ -905,39 +897,14 @@ std::optional<unsigned> llvm::getLoopEstimatedTripCount(
}
// Estimate the trip count from latch branch weights.
- std::optional<unsigned> TC = estimateLoopTripCount(L);
- if (DbgForInit) {
- // We expect no existing metadata as we are responsible for creating it.
- LLVM_DEBUG(dbgs() << (Missing ? "" : "WARNING: ")
- << "getLoopEstimatedTripCount: "
- << LLVMLoopEstimatedTripCount << " metadata "
- << (Missing ? "" : "not ") << "missing as expected "
- << "during its init for " << DbgLoop(L) << "\n");
- } else if (Missing) {
- // We expect that metadata was already created.
- LLVM_DEBUG(dbgs() << "WARNING: getLoopEstimatedTripCount: "
- << LLVMLoopEstimatedTripCount << " metadata missing for "
- << DbgLoop(L) << "\n");
- } else {
- // If the trip count is estimable, the value should have been added already.
- LLVM_DEBUG(dbgs() << (TC ? "WARNING: " : "")
- << "getLoopEstimatedTripCount: "
- << LLVMLoopEstimatedTripCount << " metadata "
- << (TC ? "incorrectly " : "correctly ")
- << "indicates trip count is inestimable for "
- << DbgLoop(L) << "\n");
- }
- return TC;
+ return estimateLoopTripCount(L);
}
bool llvm::setLoopEstimatedTripCount(
- Loop *L, std::optional<unsigned> EstimatedTripCount,
+ Loop *L, unsigned EstimatedTripCount,
std::optional<unsigned> EstimatedloopInvocationWeight) {
// Set the metadata.
- bool Updated = addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount,
- EstimatedTripCount);
- if (!EstimatedTripCount || !EstimatedloopInvocationWeight)
- return Updated;
+ addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount, EstimatedTripCount);
// At the moment, we currently support changing the estimated trip count in
// the latch branch's branch weights only. We could extend this API to
@@ -946,6 +913,8 @@ bool llvm::setLoopEstimatedTripCount(
// TODO: Eventually, once all passes have migrated away from setting branch
// weights to indicate estimated trip counts, we will not set branch weights
// here at all.
+ if (!EstimatedloopInvocationWeight)
+ return true;
BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
if (!LatchBranch)
return false;
@@ -954,9 +923,9 @@ bool llvm::setLoopEstimatedTripCount(
unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;
- if (*EstimatedTripCount != 0) {
+ if (EstimatedTripCount != 0) {
LatchExitWeight = *EstimatedloopInvocationWeight;
- BackedgeTakenWeight = (*EstimatedTripCount - 1) * LatchExitWeight;
+ BackedgeTakenWeight = (EstimatedTripCount - 1) * LatchExitWeight;
}
// Make a swap if back edge is taken when condition is "false".
@@ -970,7 +939,7 @@ bool llvm::setLoopEstimatedTripCount(
LLVMContext::MD_prof,
MDB.createBranchWeights(BackedgeTakenWeight, LatchExitWeight));
- return Updated;
+ return true;
}
bool llvm::hasIterationCountInvariantInParent(Loop *InnerLoop,
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index 1b8ae9cbc2dc8..c554fdbf4c799 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -6,9 +6,6 @@
; to be invalidated.
; Any invalidation that shows up here is a bug, unless we started modifying
; the IR, in which case we need to make it immutable harder.
-;
-; FIXME: Invalidation has been showing up for years. Something needs to be
-; fixed, perhaps just the above comment.
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='default<O1>' -S %s 2>&1 \
@@ -134,9 +131,6 @@
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
@@ -170,7 +164,6 @@
; CHECK-JUMP-TABLE-TO-SWITCH-NEXT: Running pass: JumpTableToSwitchPass
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -180,8 +173,10 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: ReassociatePass
; CHECK-O23SZ-NEXT: Running pass: ConstraintEliminationPass
+; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis
; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running pass: LoopSimplifyPass
+; CHECK-O1-NEXT: Running analysis: LoopAnalysis
; CHECK-O-NEXT: Running pass: LCSSAPass
; CHECK-O1-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 9a4eb0b1a7499..62bb02d9b3c40 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -7,9 +7,6 @@
; Any invalidation that shows up here is a bug, unless we started modifying
; the IR, in which case we need to make it immutable harder.
;
-; FIXME: Invalidation has been showing up for years. Something needs to be
-; fixed, perhaps just the above comment.
-;
; Postlink pipelines:
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto<O1>' -S %s 2>&1 \
@@ -65,10 +62,8 @@
; CHECK-O-NEXT: Running analysis: TypeBasedAA
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
+; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
@@ -98,7 +93,6 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -107,8 +101,10 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: ReassociatePass
; CHECK-O23SZ-NEXT: Running pass: ConstraintEliminationPass
+; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis
; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running pass: LoopSimplifyPass
+; CHECK-O1-NEXT: Running analysis: LoopAnalysis
; CHECK-O-NEXT: Running pass: LCSSAPass
; CHECK-O1-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 80902350829cf..0da7a9f73bdce 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -51,8 +51,6 @@
; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -83,7 +81,6 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index 6113c1f6d3657..38b7890682783 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -59,9 +59,8 @@
; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
+
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -91,7 +90,6 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index 2d1348e84b09e..5aacd26def2be 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -7,9 +7,6 @@
; Any invalidation that shows up here is a bug, unless we started modifying
; the IR, in which case we need to make it immutable harder.
;
-; FIXME: Invalidation has been showing up for years. Something needs to be
-; fixed, perhaps just the above comment.
-;
; Prelink pipelines:
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O1>' -S %s 2>&1 \
@@ -97,9 +94,6 @@
; CHECK-O-NEXT: Running analysis: TypeBasedAA
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Running analysis: LoopAnalysis
-; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
@@ -131,7 +125,6 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
@@ -140,8 +133,10 @@
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: ReassociatePass
; CHECK-O23SZ-NEXT: Running pass: ConstraintEliminationPass
+; CHECK-O23SZ-NEXT: Running analysis: LoopAnalysis
; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running pass: LoopSimplifyPass
+; CHECK-O1-NEXT: Running analysis: LoopAnalysis
; CHECK-O-NEXT: Running pass: LCSSAPass
; CHECK-O1-NEXT: Running analysis: ScalarEvolutionAnalysis
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index cefdad7eaf4f7..f6a9406596803 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -86,9 +86,6 @@
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
-; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -109,6 +106,7 @@
; CHECK-O2-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O3-NEXT: Running pass: OpenMPOptCGSCCPass
; CHECK-O-NEXT: Running pass: SROAPass
+; CHECK-O-NEXT: Running analysis: DominatorTreeAnalysis
; CHECK-O-NEXT: Running analysis: AssumptionAnalysis
; CHECK-O-NEXT: Running analysis: TargetIRAnalysis
; CHECK-O-NEXT: Running pass: EarlyCSEPass
@@ -128,6 +126,7 @@
; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis on foo
; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis on foo
+; CHECK-O-NEXT: Running analysis: LoopAnalysis on foo
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 49a81cd41a24d..48a9433d24999 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -66,8 +66,6 @@
; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
; CHECK-O-NEXT: Running pass: PGOForceFunctionAttrsPass
-; CHECK-O-NEXT: Running pass: PGOEstimateTripCountsPass
-; CHECK-O-NEXT: Invalidating analysis: LastRunTrackingAnalysis
; CHECK-O-NEXT: Running pass: AlwaysInlinerPass
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
@@ -97,7 +95,6 @@
; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
; CHECK-O-NEXT: Running pass: SimplifyCFGPass
; CHECK-O-NEXT: Running pass: InstCombinePass
-; CHECK-O-NEXT: Running analysis: LastRunTrackingAnalysis on foo
; CHECK-O23SZ-NEXT: Running pass: AggressiveInstCombinePass
; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
diff --git a/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll b/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll
index 67ef89aa51369..3353a59ea1d12 100644
--- a/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll
+++ b/llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll
@@ -31,15 +31,15 @@ define void @_Z3fn1v(ptr %r, ptr %a) #0 {
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A_021:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[T2:%.*]], -1
; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
-; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[TMP1]]
+; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SCEVGEP:%.*]], i64 [[TMP1]]
; CHECK-NEXT: [[T1_PRE:%.*]] = load i32, ptr @b, align 4
; CHECK-NEXT: br label %[[FOR_COND_LOOPEXIT:.*]]
; CHECK: [[FOR_COND_LOOPEXIT]]:
; CHECK-NEXT: [[T1:%.*]] = phi i32 [ [[T12:%.*]], %[[FOR_BODY]] ], [ [[T1_PRE]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
; CHECK-NEXT: [[R_1_LCSSA:%.*]] = phi ptr [ [[R_022:%.*]], %[[FOR_BODY]] ], [ [[ADD_PTR_LCSSA]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
-; CHECK-NEXT: [[A_1_LCSSA:%.*]] = phi ptr [ [[A_021]], %[[FOR_BODY]] ], [ [[SCEVGEP1]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
+; CHECK-NEXT: [[A_1_LCSSA:%.*]] = phi ptr [ [[A_021:%.*]], %[[FOR_BODY]] ], [ [[SCEVGEP1]], %[[FOR_COND_LOOPEXIT_LOOPEXIT]] ]
; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i32 [[T1]], 0
-; CHECK-NEXT: br i1 [[TOBOOL]], label %[[FOR_END6]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT: br i1 [[TOBOOL]], label %[[FOR_END6]], label %[[FOR_BODY]]
; CHECK: [[FOR_BODY]]:
; CHECK-NEXT: [[T12]] = phi i32 [ [[T1]], %[[FOR_COND_LOOPEXIT]] ], [ [[T]], %[[ENTRY]] ]
; CHECK-NEXT: [[R_022]] = phi ptr [ [[R_1_LCSSA]], %[[FOR_COND_LOOPEXIT]] ], [ [[R]], %[[ENTRY]] ]
@@ -106,7 +106,7 @@ define void @_Z3fn1v(ptr %r, ptr %a) #0 {
; CHECK-NEXT: [[INCDEC_PTR_1]] = getelementptr inbounds nuw i8, ptr [[A_116]], i64 2
; CHECK-NEXT: [[ADD_PTR_1]] = getelementptr inbounds nuw i8, ptr [[R_117]], i64 12
; CHECK-NEXT: [[TOBOOL2_1:%.*]] = icmp eq i32 [[DEC18_1]], 0
-; CHECK-NEXT: br i1 [[TOBOOL2_1]], label %[[FOR_COND_LOOPEXIT_LOOPEXIT]], label %[[FOR_BODY3]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-NEXT: br i1 [[TOBOOL2_1]], label %[[FOR_COND_LOOPEXIT_LOOPEXIT]], label %[[FOR_BODY3]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[FOR_END6]]:
; CHECK-NEXT: ret void
;
@@ -174,7 +174,5 @@ for.end6: ; preds = %for.cond.for.end6_c
!1 = !{!"llvm.loop.unroll.count", i32 2}
;.
; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[META1]], [[META3:![0-9]+]]}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[META1]] = !{!"llvm.loop.unroll.disable"}
;.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll b/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll
index fba87072f8b24..41fdcf2eb56d0 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/already-vectorized.ll
@@ -41,8 +41,7 @@ for.end: ; preds = %for.body
}
; Now, we check for the Hint metadata
-; CHECK: [[vect]] = distinct !{[[vect]], ![[#etc:]], [[width:![0-9]+]], [[runtime_unroll:![0-9]+]]}
-; CHECK: [[#etc]] = !{!"llvm.loop.estimated_trip_count"}
+; CHECK: [[vect]] = distinct !{[[vect]], [[width:![0-9]+]], [[runtime_unroll:![0-9]+]]}
; CHECK: [[width]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[runtime_unroll]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[scalar]] = distinct !{[[scalar]], ![[#etc]], [[runtime_unroll]], [[width]]}
+; CHECK: [[scalar]] = distinct !{[[scalar]], [[runtime_unroll]], [[width]]}
diff --git a/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll b/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll
index 9b031e31944ad..2c268664ef991 100644
--- a/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll
+++ b/llvm/test/Transforms/LoopVectorize/vect.omp.persistence.ll
@@ -13,9 +13,8 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
;
; CHECK-LABEL: @foo
; CHECK: !llvm.loop !0
-; CHECK: !0 = distinct !{!0, !1, !2}
+; CHECK: !0 = distinct !{!0, !1}
; CHECK: !1 = !{!"llvm.loop.vectorize.enable", i1 true}
-; CHECK: !2 = !{!"llvm.loop.estimated_trip_count"}
define i32 @foo(i32 %a) {
entry:
br label %loop_cond
diff --git a/llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll b/llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll
deleted file mode 100644
index f3fe8fb373694..0000000000000
--- a/llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll
+++ /dev/null
@@ -1,240 +0,0 @@
-; Check the pgo-estimate-trip-counts pass. Indirectly check
-; llvm::getLoopEstimatedTripCount and llvm::setLoopEstimatedTripCount.
-
-; RUN: opt %s -S -passes=pgo-estimate-trip-counts 2>&1 | \
-; RUN: FileCheck %s -implicit-check-not='{{^[^ ;]*:}}'
-
-; No metadata and trip count is estimable: create metadata with value.
-;
-; CHECK-LABEL: define void @estimable(i32 %n) {
-define void @estimable(i32 %n) {
-; CHECK: entry:
-entry:
- br label %body
-
-; CHECK: body:
-body:
- %i = phi i32 [ 0, %entry ], [ %inc, %body ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop ![[#ESTIMABLE:]]
- br i1 %cmp, label %body, label %exit, !prof !0
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; No metadata and trip count is inestimable because no branch weights: create
-; metadata with no value.
-;
-; CHECK-LABEL: define void @no_branch_weights(i32 %n) {
-define void @no_branch_weights(i32 %n) {
-; CHECK: entry:
-entry:
- br label %body
-
-; CHECK: body:
-body:
- %i = phi i32 [ 0, %entry ], [ %inc, %body ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %body, label %exit, !llvm.loop ![[#NO_BRANCH_WEIGHTS:]]
- br i1 %cmp, label %body, label %exit
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; No metadata and trip count is inestimable because multiple latches: create
-; metadata with no value.
-;
-; CHECK-LABEL: define void @multi_latch(i32 %n, i1 %c) {
-define void @multi_latch(i32 %n, i1 %c) {
-; CHECK: entry:
-entry:
- br label %head
-
-; CHECK: head:
-head:
- %i = phi i32 [ 0, %entry ], [ %inc, %latch0], [ %inc, %latch1 ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %latch0, label %exit, !prof !0
- br i1 %cmp, label %latch0, label %exit, !prof !0
-
-; CHECK: latch0:
-latch0:
- ; CHECK: br i1 %c, label %head, label %latch1, !prof !0, !llvm.loop ![[#MULTI_LATCH:]]
- br i1 %c, label %head, label %latch1, !prof !0
-
-; CHECK: latch1:
-latch1:
- ; CHECK: br label %head
- br label %head
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; Metadata is already present with value, and trip count is estimable: keep the
-; existing metadata value.
-;
-; CHECK-LABEL: define void @val_estimable(i32 %n) {
-define void @val_estimable(i32 %n) {
-; CHECK: entry:
-entry:
- br label %body
-
-; CHECK: body:
-body:
- %i = phi i32 [ 0, %entry ], [ %inc, %body ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop ![[#VAL_ESTIMABLE:]]
- br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop !1
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; Metadata is already present with value, and trip count is inestimable: keep
-; the existing metadata value.
-;
-; CHECK-LABEL: define void @val_inestimable(i32 %n) {
-define void @val_inestimable(i32 %n) {
-; CHECK: entry:
-entry:
- br label %body
-
-; CHECK: body:
-body:
- %i = phi i32 [ 0, %entry ], [ %inc, %body ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %body, label %exit, !llvm.loop ![[#VAL_INESTIMABLE:]]
- br i1 %cmp, label %body, label %exit, !llvm.loop !3
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; Metadata is already present without value, and trip count is estimable: add
-; new value to metadata.
-;
-; CHECK-LABEL: define void @no_val_estimable(i32 %n) {
-define void @no_val_estimable(i32 %n) {
-; CHECK: entry:
-entry:
- br label %body
-
-; CHECK: body:
-body:
- %i = phi i32 [ 0, %entry ], [ %inc, %body ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop ![[#NO_VAL_ESTIMABLE:]]
- br i1 %cmp, label %body, label %exit, !prof !0, !llvm.loop !5
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; Metadata is already present without value, and trip count is inestimable:
-; leave no value on metadata.
-;
-; CHECK-LABEL: define void @no_val_inestimable(i32 %n) {
-define void @no_val_inestimable(i32 %n) {
-; CHECK: entry:
-entry:
- br label %body
-
-; CHECK: body:
-body:
- %i = phi i32 [ 0, %entry ], [ %inc, %body ]
- %inc = add nsw i32 %i, 1
- %cmp = icmp slt i32 %inc, %n
- ; CHECK: br i1 %cmp, label %body, label %exit, !llvm.loop ![[#NO_VAL_INESTIMABLE:]]
- br i1 %cmp, label %body, label %exit, !llvm.loop !7
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; Check that nested loops are visited.
-;
-; CHECK-LABEL: define void @nested(i32 %n) {
-define void @nested(i32 %n) {
-; CHECK: entry:
-entry:
- br label %loop0.head
-
-; CHECK: loop0.head:
-loop0.head:
- %loop0.i = phi i32 [ 0, %entry ], [ %loop0.inc, %loop0.latch ]
- br label %loop1.head
-
-; CHECK: loop1.head:
-loop1.head:
- %loop1.i = phi i32 [ 0, %loop0.head ], [ %loop1.inc, %loop1.latch ]
- br label %loop2
-
-; CHECK: loop2:
-loop2:
- %loop2.i = phi i32 [ 0, %loop1.head ], [ %loop2.inc, %loop2 ]
- %loop2.inc = add nsw i32 %loop2.i, 1
- %loop2.cmp = icmp slt i32 %loop2.inc, %n
- ; CHECK: br i1 %loop2.cmp, label %loop2, label %loop1.latch, !prof !0, !llvm.loop ![[#NESTED_LOOP2:]]
- br i1 %loop2.cmp, label %loop2, label %loop1.latch, !prof !0
-
-; CHECK: loop1.latch:
-loop1.latch:
- %loop1.inc = add nsw i32 %loop1.i, 1
- %loop1.cmp = icmp slt i32 %loop1.inc, %n
- ; CHECK: br i1 %loop1.cmp, label %loop1.head, label %loop0.latch, !prof !0, !llvm.loop ![[#NESTED_LOOP1:]]
- br i1 %loop1.cmp, label %loop1.head, label %loop0.latch, !prof !0
-
-; CHECK: loop0.latch:
-loop0.latch:
- %loop0.inc = add nsw i32 %loop0.i, 1
- %loop0.cmp = icmp slt i32 %loop0.inc, %n
- ; CHECK: br i1 %loop0.cmp, label %loop0.head, label %exit, !prof !0, !llvm.loop ![[#NESTED_LOOP0:]]
- br i1 %loop0.cmp, label %loop0.head, label %exit, !prof !0
-
-; CHECK: exit:
-exit:
- ret void
-}
-
-; CHECK: !0 = !{!"branch_weights", i32 9, i32 1}
-;
-; CHECK: ![[#ESTIMABLE]] = distinct !{![[#ESTIMABLE]], ![[#ESTIMABLE_TC:]]}
-; CHECK: ![[#ESTIMABLE_TC]] = !{!"llvm.loop.estimated_trip_count", i32 10}
-;
-; CHECK: ![[#NO_BRANCH_WEIGHTS]] = distinct !{![[#NO_BRANCH_WEIGHTS]], ![[#INESTIMABLE_TC:]]}
-; CHECK: ![[#INESTIMABLE_TC]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: ![[#MULTI_LATCH]] = distinct !{![[#MULTI_LATCH]], ![[#INESTIMABLE_TC:]]}
-;
-; CHECK: ![[#VAL_ESTIMABLE]] = distinct !{![[#VAL_ESTIMABLE]], ![[#VAL_TC:]]}
-; CHECK: ![[#VAL_TC]] = !{!"llvm.loop.estimated_trip_count", i32 5}
-; CHECK: ![[#VAL_INESTIMABLE]] = distinct !{![[#VAL_INESTIMABLE]], ![[#VAL_TC:]]}
-;
-; CHECK: ![[#NO_VAL_ESTIMABLE]] = distinct !{![[#NO_VAL_ESTIMABLE]], ![[#ESTIMABLE_TC:]]}
-; CHECK: ![[#NO_VAL_INESTIMABLE]] = distinct !{![[#NO_VAL_INESTIMABLE]], ![[#INESTIMABLE_TC:]]}
-;
-; CHECK: ![[#NESTED_LOOP2]] = distinct !{![[#NESTED_LOOP2]], ![[#ESTIMABLE_TC:]]}
-; CHECK: ![[#NESTED_LOOP1]] = distinct !{![[#NESTED_LOOP1]], ![[#ESTIMABLE_TC:]]}
-; CHECK: ![[#NESTED_LOOP0]] = distinct !{![[#NESTED_LOOP0]], ![[#ESTIMABLE_TC:]]}
-!0 = !{!"branch_weights", i32 9, i32 1}
-!1 = distinct !{!1, !2}
-!2 = !{!"llvm.loop.estimated_trip_count", i32 5}
-!3 = distinct !{!3, !2}
-!5 = distinct !{!5, !6}
-!6 = !{!"llvm.loop.estimated_trip_count"}
-!7 = distinct !{!7, !6}
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll
index 7c96fce08a7e3..05674b9efc39d 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/block_scaling_decompr_8bit.ll
@@ -94,7 +94,7 @@ define dso_local noundef i32 @_Z33block_scaling_decompr_8bitjPK27compressed_data
; CHECK-NEXT: [[DST_ADDR_1]] = getelementptr inbounds nuw i8, ptr [[DST_ADDR_052]], i64 48
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT58]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END]], label %[[FOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END]], label %[[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[FOR_END]]:
; CHECK-NEXT: ret i32 0
;
@@ -801,9 +801,8 @@ attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
!4 = distinct !{!4, !5}
!5 = !{!"llvm.loop.mustprogress"}
;.
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]], [[META7:![0-9]+]]}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META5:![0-9]+]], [[META6:![0-9]+]]}
; CHECK: [[META5]] = !{!"llvm.loop.mustprogress"}
-; CHECK: [[META6]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META7]] = !{!"llvm.loop.unswitch.nontrivial.disable"}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META5]], [[META6]]}
+; CHECK: [[META6]] = !{!"llvm.loop.unswitch.nontrivial.disable"}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META5]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll
index 496e33d09d2a1..d8fc42bfaaec2 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/extra-unroll-simplifications.ll
@@ -107,7 +107,7 @@ define void @cse_matching_load_from_previous_unrolled_iteration(i32 %N, ptr %src
; CHECK-NEXT: [[INDVARS_IV_NEXT_1]] = add nuw nsw i64 [[INDVARS_IV]], 2
; CHECK-NEXT: [[NITER_NEXT_1]] = add i64 [[NITER]], 2
; CHECK-NEXT: [[NITER_NCMP_1:%.*]] = icmp eq i64 [[NITER_NEXT_1]], [[UNROLL_ITER]]
-; CHECK-NEXT: br i1 [[NITER_NCMP_1]], label [[EXIT_LOOPEXIT_UNR_LCSSA]], label [[LOOP_LATCH]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: br i1 [[NITER_NCMP_1]], label [[EXIT_LOOPEXIT_UNR_LCSSA]], label [[LOOP_LATCH]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: exit.loopexit.unr-lcssa:
; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ 0, [[LOOP_LATCH_PREHEADER]] ], [ [[INDVARS_IV_NEXT_1]], [[LOOP_LATCH]] ]
; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
@@ -154,9 +154,8 @@ exit:
!1 = !{!"llvm.loop.mustprogress"}
!2 = !{!"llvm.loop.unroll.count", i32 2}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
; CHECK: [[META1]] = !{!"llvm.loop.mustprogress"}
-; CHECK: [[META2]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
index 4b755fd9536d5..141503d344fe9 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
@@ -54,7 +54,7 @@ define i32 @read_only_loop_with_runtime_check(ptr noundef %array, i32 noundef %c
; CHECK-NEXT: [[ADD]] = add nsw i32 [[TMP8]], [[SUM_07]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[TMP0]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: if.then:
; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable
@@ -129,7 +129,7 @@ define dso_local noundef i32 @sum_prefix_with_sum(ptr %s.coerce0, i64 %s.coerce1
; CHECK: for.body.preheader:
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
; CHECK-NEXT: [[DOTNOT_NOT:%.*]] = icmp ugt i64 [[S_COERCE1]], [[TMP0]]
-; CHECK-NEXT: br i1 [[DOTNOT_NOT]], label [[FOR_BODY_PREHEADER8:%.*]], label [[COND_FALSE_I:%.*]], !prof [[PROF5:![0-9]+]]
+; CHECK-NEXT: br i1 [[DOTNOT_NOT]], label [[FOR_BODY_PREHEADER8:%.*]], label [[COND_FALSE_I:%.*]], !prof [[PROF4:![0-9]+]]
; CHECK: for.body.preheader8:
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER11:%.*]], label [[VECTOR_PH:%.*]]
@@ -148,7 +148,7 @@ define dso_local noundef i32 @sum_prefix_with_sum(ptr %s.coerce0, i64 %s.coerce1
; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[WIDE_LOAD10]], [[VEC_PHI9]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP5]], label [[SPAN_CHECKED_ACCESS_EXIT:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP5]], label [[SPAN_CHECKED_ACCESS_EXIT:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[ADD:%.*]] = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])
@@ -169,7 +169,7 @@ define dso_local noundef i32 @sum_prefix_with_sum(ptr %s.coerce0, i64 %s.coerce1
; CHECK-NEXT: [[ADD1]] = add nsw i32 [[TMP7]], [[RET_06]]
; CHECK-NEXT: [[INC]] = add nuw i64 [[I_07]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY1]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY1]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: cond.false.i:
; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable
@@ -228,7 +228,7 @@ define hidden noundef nonnull align 4 dereferenceable(4) ptr @span_checked_acces
; CHECK-NEXT: [[__SIZE__I:%.*]] = getelementptr inbounds nuw i8, ptr [[THIS]], i64 8
; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[__SIZE__I]], align 8
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[__IDX]], [[TMP0]]
-; CHECK-NEXT: br i1 [[CMP]], label [[COND_END:%.*]], label [[COND_FALSE:%.*]], !prof [[PROF5]]
+; CHECK-NEXT: br i1 [[CMP]], label [[COND_END:%.*]], label [[COND_FALSE:%.*]], !prof [[PROF4]]
; CHECK: cond.false:
; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable
@@ -289,12 +289,11 @@ declare void @llvm.trap()
declare void @llvm.lifetime.end.p0(ptr nocapture)
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META3]], [[META2]]}
-; CHECK: [[PROF5]] = !{!"branch_weights", !"expected", i32 2000, i32 1}
-; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META3]], [[META2]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; CHECK: [[PROF4]] = !{!"branch_weights", !"expected", i32 2000, i32 1}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]]}
+; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META2]], [[META1]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll
index 552035c0fca6a..b056f44a6c469 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/indvars-vectorization.ll
@@ -88,7 +88,7 @@ define void @s172(i32 noundef %xa, i32 noundef %xb, ptr noundef %a, ptr noundef
; CHECK-NEXT: store i32 [[ADD]], ptr [[GEP_A]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], [[TMP1]]
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 32000
-; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP10:![0-9]+]]
+; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: for.end:
; CHECK-NEXT: ret void
;
@@ -131,10 +131,9 @@ for.end:
; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]], [[META8:![0-9]+]], [[META9:![0-9]+]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META6:![0-9]+]], [[META7:![0-9]+]], [[META8:![0-9]+]]}
; CHECK: [[META6]] = !{!"llvm.loop.mustprogress"}
-; CHECK: [[META7]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META8]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META9]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP10]] = distinct !{[[LOOP10]], [[META6]], [[META7]], [[META8]]}
+; CHECK: [[META7]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META8]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META6]], [[META7]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll
index 1012eb20691d2..afe7d7498fc1d 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll
@@ -141,7 +141,7 @@ define void @same_op2_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <8 x float> [[INTERLEAVED_VEC22]], ptr [[TMP7]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 576
-; CHECK-NEXT: br i1 [[TMP10]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP10]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -235,7 +235,7 @@ define void @same_op3(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <12 x float> [[INTERLEAVED_VEC]], ptr [[TMP2]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 384
-; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -334,7 +334,7 @@ define void @same_op3_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <12 x float> [[INTERLEAVED_VEC]], ptr [[TMP3]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 384
-; CHECK-NEXT: br i1 [[TMP12]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP12]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -428,7 +428,7 @@ define void @same_op4(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <16 x float> [[INTERLEAVED_VEC]], ptr [[TMP2]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 288
-; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_END13:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -527,7 +527,7 @@ define void @same_op4_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <16 x float> [[INTERLEAVED_VEC]], ptr [[TMP3]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 288
-; CHECK-NEXT: br i1 [[TMP5]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP5]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -630,7 +630,7 @@ define void @same_op6(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[ARRAYIDX9_4]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 6
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i64 [[INDVARS_IV]], 1146
-; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]], !llvm.loop [[LOOP9:![0-9]+]]
+; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -738,7 +738,7 @@ define void @same_op6_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <2 x float> [[TMP13]], ptr [[ARRAYIDX7_4]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 6
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i64 [[INDVARS_IV]], 1146
-; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END11:.*]], !llvm.loop [[LOOP10:![0-9]+]]
+; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END11:.*]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -841,7 +841,7 @@ define void @same_op8(ptr noalias noundef %a, ptr noundef %b, ptr noundef %c) {
; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[ARRAYIDX9_4]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 8
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i64 [[INDVARS_IV]], 1144
-; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]], !llvm.loop [[LOOP11:![0-9]+]]
+; CHECK-NEXT: br i1 [[CMP]], label %[[FOR_COND1_PREHEADER]], label %[[FOR_END13:.*]]
; CHECK: [[FOR_END13]]:
; CHECK-NEXT: ret void
;
@@ -940,7 +940,7 @@ define void @same_op8_splat(ptr noalias noundef %a, ptr noundef %b, ptr noundef
; CHECK-NEXT: store <16 x float> [[INTERLEAVED_VEC]], ptr [[TMP6]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], 144
-; CHECK-NEXT: br i1 [[TMP25]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP25]], label %[[FOR_END11:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: [[FOR_END11]]:
; CHECK-NEXT: ret void
;
@@ -1014,17 +1014,13 @@ for.end11: ; preds = %for.cond
ret void
}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META1]]}
-; CHECK: [[LOOP10]] = distinct !{[[LOOP10]], [[META1]]}
-; CHECK: [[LOOP11]] = distinct !{[[LOOP11]], [[META1]]}
-; CHECK: [[LOOP12]] = distinct !{[[LOOP12]], [[META1]], [[META2]], [[META3]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]]}
+; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META2]]}
+; CHECK: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll
index c8e7868b41a82..e8709a50524f8 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll
@@ -81,7 +81,7 @@ define nofpclass(nan inf) double @monte_simple(i32 noundef %nblocks, i32 noundef
; CHECK-NEXT: [[V1_2]] = fadd reassoc arcp contract afn double [[V1_012]], [[ADD4]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: [[FOR_END_LOOPEXIT]]:
; CHECK-NEXT: [[V0_1:%.*]] = phi double [ [[TMP22]], %[[MIDDLE_BLOCK]] ], [ [[V0_2]], %[[FOR_BODY]] ]
; CHECK-NEXT: [[V1_1:%.*]] = phi double [ [[TMP21]], %[[MIDDLE_BLOCK]] ], [ [[V1_2]], %[[FOR_BODY]] ]
@@ -239,7 +239,7 @@ define nofpclass(nan inf) double @monte_exp(i32 noundef %nblocks, i32 noundef %R
; CHECK-NEXT: [[TMP23]] = fadd reassoc arcp contract afn <4 x double> [[VEC_PHI31]], [[TMP21]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDVARS_IV1]], 8
; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP24]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP24]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc arcp contract afn <4 x double> [[TMP23]], [[TMP22]]
; CHECK-NEXT: [[TMP25:%.*]] = tail call reassoc arcp contract afn double @llvm.vector.reduce.fadd.v4f64(double -0.000000e+00, <4 x double> [[BIN_RDX]])
@@ -269,19 +269,19 @@ define nofpclass(nan inf) double @monte_exp(i32 noundef %nblocks, i32 noundef %R
; CHECK-NEXT: [[V1_2_US]] = fadd reassoc arcp contract afn double [[V1_116_US]], [[ADD7_US1]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND25_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT: br i1 [[EXITCOND25_NOT]], label %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]], label %[[FOR_BODY3_US]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND25_NOT]], label %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]], label %[[FOR_BODY3_US]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[FOR_COND1_FOR_INC8_CRIT_EDGE_US]]:
; CHECK-NEXT: [[V0_2_US_LCSSA]] = phi double [ [[TMP26]], %[[MIDDLE_BLOCK]] ], [ [[V0_2_US]], %[[FOR_BODY3_US]] ]
; CHECK-NEXT: [[V1_2_US_LCSSA]] = phi double [ [[TMP25]], %[[MIDDLE_BLOCK]] ], [ [[V1_2_US]], %[[FOR_BODY3_US]] ]
; CHECK-NEXT: [[INC9_US]] = add nuw nsw i32 [[BLOCK_017_US]], 1
; CHECK-NEXT: [[EXITCOND26_NOT:%.*]] = icmp eq i32 [[INC9_US]], [[NBLOCKS]]
-; CHECK-NEXT: br i1 [[EXITCOND26_NOT]], label %[[FOR_END10]], label %[[FOR_BODY_US]], !llvm.loop [[LOOP7:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND26_NOT]], label %[[FOR_END10]], label %[[FOR_BODY_US]]
; CHECK: [[FOR_BODY]]:
; CHECK-NEXT: [[BLOCK_017:%.*]] = phi i32 [ [[INC9:%.*]], %[[FOR_BODY]] ], [ 0, %[[FOR_BODY_LR_PH]] ]
; CHECK-NEXT: tail call void @resample(i32 noundef [[RAND_BLOCK_LENGTH]], ptr noundef [[SAMPLES]])
; CHECK-NEXT: [[INC9]] = add nuw nsw i32 [[BLOCK_017]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC9]], [[NBLOCKS]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END10]], label %[[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_END10]], label %[[FOR_BODY]]
; CHECK: [[FOR_END10]]:
; CHECK-NEXT: [[V0_0_LCSSA:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ], [ [[V0_2_US_LCSSA]], %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]] ], [ 0.000000e+00, %[[FOR_BODY]] ]
; CHECK-NEXT: [[V1_0_LCSSA:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ], [ [[V1_2_US_LCSSA]], %[[FOR_COND1_FOR_INC8_CRIT_EDGE_US]] ], [ 0.000000e+00, %[[FOR_BODY]] ]
@@ -403,14 +403,10 @@ declare void @resample(i32 noundef, ptr noundef)
declare double @llvm.exp2.f64(double)
declare void @llvm.lifetime.end.p0(ptr nocapture)
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META3]], [[META2]]}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]], [[META2]], [[META3]]}
-; CHECK: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META3]], [[META2]]}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META1]], [[META8:![0-9]+]]}
-; CHECK: [[META8]] = !{!"llvm.loop.unswitch.nontrivial.disable"}
-; CHECK: [[LOOP9]] = distinct !{[[LOOP9]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll b/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll
index 1c8f2c4612c4e..482907d295706 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/pr88239.ll
@@ -48,8 +48,7 @@ define void @foo(ptr noalias noundef %0, ptr noalias noundef %1) optsize {
br label %3
}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll b/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll
index a101a686b02c7..cb378465e30ec 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/preserve-access-group.ll
@@ -63,7 +63,7 @@ define void @test(i32 noundef %nface, i32 noundef %ncell, ptr noalias noundef %f
; CHECK-NEXT: store double [[TMP26]], ptr [[ARRAYIDX4_3]], align 8, !tbaa [[TBAA5]], !llvm.access.group [[ACC_GRP4]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[TMP0]]
-; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
;
entry:
%nface.addr = alloca i32, align 4
@@ -197,11 +197,10 @@ attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: re
; CHECK: [[ACC_GRP4]] = distinct !{}
; CHECK: [[TBAA5]] = !{[[META6:![0-9]+]], [[META6]], i64 0}
; CHECK: [[META6]] = !{!"double", [[META2]], i64 0}
-; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META8:![0-9]+]], [[META9:![0-9]+]], [[META10:![0-9]+]], [[META11:![0-9]+]], [[META12:![0-9]+]]}
+; CHECK: [[LOOP7]] = distinct !{[[LOOP7]], [[META8:![0-9]+]], [[META9:![0-9]+]], [[META10:![0-9]+]], [[META11:![0-9]+]]}
; CHECK: [[META8]] = !{!"llvm.loop.mustprogress"}
; CHECK: [[META9]] = !{!"llvm.loop.parallel_accesses", [[ACC_GRP4]]}
-; CHECK: [[META10]] = !{!"llvm.loop.estimated_trip_count"}
-; CHECK: [[META11]] = !{!"llvm.loop.isvectorized", i32 1}
-; CHECK: [[META12]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP13]] = distinct !{[[LOOP13]], [[META8]], [[META9]], [[META10]], [[META12]], [[META11]]}
+; CHECK: [[META10]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META11]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP12]] = distinct !{[[LOOP12]], [[META8]], [[META9]], [[META11]], [[META10]]}
;.
diff --git a/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll b/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll
index c013984202a29..904ea6c2d9e73 100644
--- a/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll
+++ b/llvm/test/Transforms/PhaseOrdering/branch-dom-cond.ll
@@ -13,7 +13,7 @@ define void @growTables(ptr %p) {
; CHECK-NEXT: [[CALL9:%.*]] = load volatile ptr, ptr [[P]], align 8
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_02]], 1
; CHECK-NEXT: [[CMP7:%.*]] = icmp slt i32 [[INC]], [[CALL]]
-; CHECK-NEXT: br i1 [[CMP7]], label %[[FOR_BODY]], label %[[FOR_BODY12:.*]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT: br i1 [[CMP7]], label %[[FOR_BODY]], label %[[FOR_BODY12:.*]]
; CHECK: [[FOR_BODY12]]:
; CHECK-NEXT: [[CALL14:%.*]] = load volatile ptr, ptr [[P]], align 8
; CHECK-NEXT: br label %[[COMMON_RET]]
@@ -45,7 +45,3 @@ for.body12:
common.ret:
ret void
}
-;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
-; CHECK: [[META1]] = !{!"llvm.loop.estimated_trip_count"}
-;.
diff --git a/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll b/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll
index 0c44d960be244..3c0bc8a39ebeb 100644
--- a/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll
+++ b/llvm/test/Verifier/llvm.loop.estimated_trip_count.ll
@@ -15,16 +15,19 @@ exit:
; GOOD-NOT: {{.}}
-; BAD-VALUE: Expected optional second operand to be an integer constant of type i32 or smaller
+; BAD-VALUE: Expected second operand to be an integer constant of type i32 or smaller
; BAD-VALUE-NEXT: !1 = !{!"llvm.loop.estimated_trip_count",
-; TOO-MANY: Expected one or two operands
+; TOO-FEW: Expected two operands
+; TOO-FEW-NEXT: !1 = !{!"llvm.loop.estimated_trip_count"}
+
+; TOO-MANY: Expected two operands
; TOO-MANY-NEXT: !1 = !{!"llvm.loop.estimated_trip_count", i32 5, i32 5}
; No value.
; RUN: cp %s %t
; RUN: echo '!1 = !{!"llvm.loop.estimated_trip_count"}' >> %t
-; RUN: %{RUN} GOOD
+; RUN: not %{RUN} TOO-FEW
; i16 value.
; RUN: cp %s %t
>From 59cd1847a79938bfa4229e18a392c4113dc1d20a Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Tue, 19 Aug 2025 15:25:39 -0400
Subject: [PATCH 24/27] Fix case where nested loops share latch
---
.../include/llvm/Transforms/Utils/LoopUtils.h | 14 +++--
llvm/lib/Transforms/Utils/LoopUtils.cpp | 49 +++++++++++++----
.../Transforms/Utils/LoopUtilsTest.cpp | 53 +++++++++++++++++++
3 files changed, 97 insertions(+), 19 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 0ee22a24e56c5..da2c835e9e4b5 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -324,12 +324,12 @@ LLVM_ABI void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
unsigned V = 0);
/// Return either:
+/// - \c std::nullopt, if the implementation is unable to handle the loop form
+/// of \p L (e.g., \p L must have a latch block that controls the loop exit).
/// - The value of \c llvm.loop.estimated_trip_count from the loop metadata of
/// \p L, if that metadata is present.
/// - Else, a new estimate of the trip count from the latch branch weights of
-/// \p L, if the estimation's implementation is able to handle the loop form
-/// of \p L (e.g., \p L must have a latch block that controls the loop exit).
-/// - Else, \c std::nullopt.
+/// \p L.
///
/// An estimated trip count is always a valid positive trip count, saturated at
/// \c UINT_MAX.
@@ -350,17 +350,15 @@ getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight = nullptr);
/// Set \c llvm.loop.estimated_trip_count with the value \p EstimatedTripCount
-/// in the loop metadata of \p L.
+/// in the loop metadata of \p L. Return false if the implementation is unable
+/// to handle the loop form of \p L (e.g., \p L must have a latch block that
+/// controls the loop exit). Otherwise, return true.
///
/// In addition, if \p EstimatedLoopInvocationWeight, set the branch weight
/// metadata of \p L to reflect that \p L has an estimated
/// \p EstimatedTripCount iterations and has \c *EstimatedLoopInvocationWeight
/// exit weight through the loop's latch.
///
-/// Return false if \p EstimatedLoopInvocationWeight and if branch weight
-/// metadata could not be successfully updated (e.g., if \p L does not have a
-/// latch block that controls the loop exit). Otherwise, return true.
-///
/// TODO: Eventually, once all passes have migrated away from setting branch
/// weights to indicate estimated trip counts, this function will drop the
/// \p EstimatedLoopInvocationWeight parameter.
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 38d249a50b0a8..b5238f64d99fb 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -868,6 +868,27 @@ static std::optional<unsigned> estimateLoopTripCount(Loop *L) {
std::optional<unsigned>
llvm::getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight) {
+ // If EstimatedLoopInvocationWeight, we do not support this loop if
+ // getExpectedExitLoopLatchBranch returns nullptr.
+ //
+ // FIXME: Also, this is a stop-gap solution for nested loops. It avoids
+ // mistaking LLVMLoopEstimatedTripCount metadata to be for an outer loop when
+ // it was created for an inner loop. The problem is that loop metadata is
+ // attached to the branch instruction in the loop latch block, but that can be
+ // shared by the loops. The solution is to attach loop metadata to loop
+ // headers instead, but that would be a large change to LLVM.
+ //
+ // Until that happens, we work around the problem as follows.
+ // getExpectedExitLoopLatchBranch (which also guards
+ // setLoopEstimatedTripCount) will not recognize the same latch for both loops
+ // unless the latch exits both loops and has only two successors. However, to
+ // exit both loops, the latch must have at least three successors: the inner
+ // loop header, the outer loop header (exit for the inner loop), and an exit
+ // for the outer loop.
+ BranchInst *ExitingBranch = getExpectedExitLoopLatchBranch(L);
+ if (!ExitingBranch)
+ return std::nullopt;
+
// If requested, either compute *EstimatedLoopInvocationWeight or return
// nullopt if cannot.
//
@@ -875,16 +896,14 @@ llvm::getLoopEstimatedTripCount(Loop *L,
// weights to indicate estimated trip counts, this function will drop the
// EstimatedLoopInvocationWeight parameter.
if (EstimatedLoopInvocationWeight) {
- if (BranchInst *ExitingBranch = getExpectedExitLoopLatchBranch(L)) {
- uint64_t LoopWeight = 0, ExitWeight = 0; // Inits expected to be unused.
- if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight))
- return std::nullopt;
- if (L->contains(ExitingBranch->getSuccessor(1)))
- std::swap(LoopWeight, ExitWeight);
- if (!ExitWeight)
- return std::nullopt;
- *EstimatedLoopInvocationWeight = ExitWeight;
- }
+ uint64_t LoopWeight = 0, ExitWeight = 0; // Inits expected to be unused.
+ if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight))
+ return std::nullopt;
+ if (L->contains(ExitingBranch->getSuccessor(1)))
+ std::swap(LoopWeight, ExitWeight);
+ if (!ExitWeight)
+ return std::nullopt;
+ *EstimatedLoopInvocationWeight = ExitWeight;
}
// Return the estimated trip count from metadata unless the metadata is
@@ -903,6 +922,15 @@ llvm::getLoopEstimatedTripCount(Loop *L,
bool llvm::setLoopEstimatedTripCount(
Loop *L, unsigned EstimatedTripCount,
std::optional<unsigned> EstimatedloopInvocationWeight) {
+ // If EstimatedLoopInvocationWeight, we do not support this loop if
+ // getExpectedExitLoopLatchBranch returns nullptr.
+ //
+ // FIXME: See comments in getLoopEstimatedTripCount for why this is required
+ // here regardless of EstimatedLoopInvocationWeight.
+ BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
+ if (!LatchBranch)
+ return false;
+
// Set the metadata.
addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount, EstimatedTripCount);
@@ -915,7 +943,6 @@ bool llvm::setLoopEstimatedTripCount(
// here at all.
if (!EstimatedloopInvocationWeight)
return true;
- BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
if (!LatchBranch)
return false;
diff --git a/llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp b/llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp
index c22a3582bee86..ce002e9239960 100644
--- a/llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp
+++ b/llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp
@@ -142,3 +142,56 @@ TEST(LoopUtils, IsKnownNonPositiveInLoopTest) {
EXPECT_EQ(isKnownNonPositiveInLoop(ArgSCEV, L, SE), true);
});
}
+
+// The inner and outer loop here share a latch. Because any loop metadata must
+// be attached to that latch, loop metadata cannot distinguish between the two
+// loops. Until that problem is solved (by moving loop metadata to loops'
+// header blocks instead), getLoopEstimatedTripCount and
+// setLoopEstimatedTripCount must refuse to operate on at least one of the two
+// loops. They choose to reject the outer loop here because the latch does not
+// exit it.
+TEST(LoopUtils, nestedLoopSharedLatchEstimatedTripCount) {
+ LLVMContext C;
+ std::unique_ptr<Module> M =
+ parseIR(C, "declare i1 @f()\n"
+ "declare i1 @g()\n"
+ "define void @foo() {\n"
+ "entry:\n"
+ " br label %outer\n"
+ "outer:\n"
+ " %c0 = call i1 @f()"
+ " br i1 %c0, label %inner, label %exit, !prof !0\n"
+ "inner:\n"
+ " %c1 = call i1 @g()"
+ " br i1 %c1, label %inner, label %outer, !prof !1\n"
+ "exit:\n"
+ " ret void\n"
+ "}\n"
+ "!0 = !{!\"branch_weights\", i32 100, i32 1}\n"
+ "!1 = !{!\"branch_weights\", i32 4, i32 1}\n"
+ "\n");
+
+ run(*M, "foo",
+ [&](Function &F, DominatorTree &DT, ScalarEvolution &SE, LoopInfo &LI) {
+ assert(LI.end() - LI.begin() == 1 && "Expected one outer loop");
+ Loop *Outer = *LI.begin();
+ assert(Outer->end() - Outer->begin() == 1 && "Expected one inner loop");
+ Loop *Inner = *Outer->begin();
+
+ // Even before llvm.loop.estimated_trip_count is added to either loop,
+ // getLoopEstimatedTripCount rejects the outer loop.
+ EXPECT_EQ(getLoopEstimatedTripCount(Inner), 5);
+ EXPECT_EQ(getLoopEstimatedTripCount(Outer), std::nullopt);
+
+ // setLoopEstimatedTripCount for the inner loop does not affect
+ // getLoopEstimatedTripCount for the outer loop.
+ EXPECT_EQ(setLoopEstimatedTripCount(Inner, 100), true);
+ EXPECT_EQ(getLoopEstimatedTripCount(Inner), 100);
+ EXPECT_EQ(getLoopEstimatedTripCount(Outer), std::nullopt);
+
+ // setLoopEstimatedTripCount rejects the outer loop.
+ EXPECT_EQ(setLoopEstimatedTripCount(Outer, 999), false);
+ EXPECT_EQ(getLoopEstimatedTripCount(Inner), 100);
+ EXPECT_EQ(getLoopEstimatedTripCount(Outer), std::nullopt);
+ });
+}
>From 5719779bf677a6461456e9ebeedbcb50a94216b2 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 25 Aug 2025 17:11:21 -0400
Subject: [PATCH 25/27] Remove redundant code
---
llvm/lib/Transforms/Utils/LoopUtils.cpp | 2 --
1 file changed, 2 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 071f602fdec5e..0baee56fd6009 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -943,8 +943,6 @@ bool llvm::setLoopEstimatedTripCount(
// here at all.
if (!EstimatedloopInvocationWeight)
return true;
- if (!LatchBranch)
- return false;
// Calculate taken and exit weights.
unsigned LatchExitWeight = 0;
>From 98cab7bbdeebb29407ebdae606c37eb4b8e4cdbc Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 25 Aug 2025 17:33:20 -0400
Subject: [PATCH 26/27] Clarify recent comments some
---
llvm/lib/Transforms/Utils/LoopUtils.cpp | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 0baee56fd6009..37dfaa8df6c6b 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -875,16 +875,21 @@ llvm::getLoopEstimatedTripCount(Loop *L,
// mistaking LLVMLoopEstimatedTripCount metadata to be for an outer loop when
// it was created for an inner loop. The problem is that loop metadata is
// attached to the branch instruction in the loop latch block, but that can be
- // shared by the loops. The solution is to attach loop metadata to loop
- // headers instead, but that would be a large change to LLVM.
+ // shared by the loops. A solution is to attach loop metadata to loop headers
+ // instead, but that would be a large change to LLVM.
//
// Until that happens, we work around the problem as follows.
// getExpectedExitLoopLatchBranch (which also guards
- // setLoopEstimatedTripCount) will not recognize the same latch for both loops
- // unless the latch exits both loops and has only two successors. However, to
- // exit both loops, the latch must have at least three successors: the inner
- // loop header, the outer loop header (exit for the inner loop), and an exit
- // for the outer loop.
+ // setLoopEstimatedTripCount) returns nullptr for a loop unless the loop has
+ // one latch and that latch has exactly two successors one of which is an exit
+ // from the loop. If the latch is shared by nested loops, then that condition
+ // might hold for the inner loop but cannot hold for the outer loop:
+ // - Because the latch is shared, it must have at least two successors: the
+ // inner loop header and the outer loop header, which is also an exit for
+ // the inner loop. That satisifies the condition for the inner loop.
+ // - To satsify the condition for the outer loop, the latch must have a third
+ // successor that is an exit for the outer loop. But that violates the
+ // condition for both loops.
BranchInst *ExitingBranch = getExpectedExitLoopLatchBranch(L);
if (!ExitingBranch)
return std::nullopt;
>From ea9addbd2505ff895c151d855b1feb03c84c9449 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Mon, 15 Sep 2025 12:18:10 -0400
Subject: [PATCH 27/27] Fake commit to re-trigger CI
More information about the llvm-commits
mailing list