[llvm] [LoopPeel] Fix branch weights (PR #128785)

Joel E. Denny via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 25 15:12:39 PST 2025


https://github.com/jdenny-ornl created https://github.com/llvm/llvm-project/pull/128785

For example, `llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll` tests the following LLVM IR:

```
define void @test() {
entry:
  br label %loop

loop:
  %x = call i32 @get.x()
  switch i32 %x, label %loop.latch [
  i32 0, label %loop.latch
  i32 1, label %loop.exit
  i32 2, label %loop.exit
  ], !prof !0

loop.latch:
  br label %loop

loop.exit:
  ret void
}

!0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
```

Given those branch weights, once any loop iteration is actually reached, the probability of the loop exiting at the iteration's end is (20+10)/(100+200+20+10) = 1/11.  That is, the loop is likely to exit every 11 iterations.  `opt -passes='print<block-freq>'` shows that 11 is indeed the frequency of the loop body:

```
block-frequency-info: test
 - entry: float = 1.0, int = 1637672590245888
 - loop: float = 11.0, int = 18014398509481984
 - loop.latch: float = 10.0, int = 16376725919236096
 - loop.exit: float = 1.0, int = 1637672590245888
```

Key Observation: The frequency of reaching any particular iteration is logically less than for the previous iteration exactly because the previous iteration has a non-zero probability of exiting the loop. This observation holds even though every loop iteration, once actually reached, has exactly the same probability of exiting and exactly the same branch weights.

After peeling 2 iterations as in the test, we expect those observations not to change, but they do under the implementation without this patch.  The block frequency becomes 1.0 for the first iteration, 0.90909 for the second, and 7.3636 for the main loop body. Again, a decreasing frequency is expected, but it decreases too much: the total frequency of the original loop body becomes 9.2727.  The new branch weights reveal the problem:

```
!0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
!1 = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
!2 = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
```

The exit probability is now 1/11 for the first peeled iteration, 1/10 for the second, and 1/9 for the remaining loop iterations.  Based on comments in `LoopPeel.cpp`, it seems this behavior was trying to ensure a decreasing frequency.  However, as explained above for the original loop, that happens correctly without decreasing the branch weights across iterations.

This patch changes the peeling implementation not to decrease the branch weights across loop iterations so that the probabilities for every iteration are the same as they were in the original loop.  The total frequency of the loop body, summed across all its occurrences, thus remains 11 after peeling.

>From be2ad3001fe157f974128f262ea7f08d99157540 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Tue, 25 Feb 2025 17:35:02 -0500
Subject: [PATCH] [LoopPeel] Fix branch weights

For example, `llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll`
tests the following LLVM IR:

```
define void @test() {
entry:
  br label %loop

loop:
  %x = call i32 @get.x()
  switch i32 %x, label %loop.latch [
  i32 0, label %loop.latch
  i32 1, label %loop.exit
  i32 2, label %loop.exit
  ], !prof !0

loop.latch:
  br label %loop

loop.exit:
  ret void
}

!0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
```

Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
(20+10)/(100+200+20+10) = 1/11.  That is, the loop is likely to exit
every 11 iterations.  `opt -passes='print<block-freq>'` shows that 11
is indeed the frequency of the loop body:

```
block-frequency-info: test
 - entry: float = 1.0, int = 1637672590245888
 - loop: float = 11.0, int = 18014398509481984
 - loop.latch: float = 10.0, int = 16376725919236096
 - loop.exit: float = 1.0, int = 1637672590245888
```

Key Observation: The frequency of reaching any particular iteration is
logically less than for the previous iteration exactly because the
previous iteration has a non-zero probability of exiting the loop.
This observation holds even though every loop iteration, once actually
reached, has exactly the same probability of exiting and exactly the
same branch weights.

After peeling 2 iterations as in the test, we expect those
observations not to change, but they do under the implementation
without this patch.  The block frequency becomes 1.0 for the first
iteration, 0.90909 for the second, and 7.3636 for the main loop body.
Again, a decreasing frequency is expected, but it decreases too much:
the total frequency of the original loop body becomes 9.2727.  The new
branch weights reveal the problem:

```
!0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
!1 = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
!2 = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
```

The exit probability is now 1/11 for the first peeled iteration, 1/10
for the second, and 1/9 for the remaining loop iterations.  Based on
comments in `LoopPeel.cpp`, it seems this behavior was trying to
ensure a decreasing frequency.  However, as explained above for the
original loop, that happens correctly without decreasing the branch
weights across iterations.

This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the probabilities for
every iteration are the same as they were in the original loop.  The
total frequency of the loop body, summed across all its occurrences,
thus remains 11 after peeling.
---
 llvm/lib/Transforms/Utils/LoopPeel.cpp        | 105 +++---------------
 .../LoopUnroll/peel-branch-weights.ll         |  64 +++++------
 2 files changed, 47 insertions(+), 122 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 9a24c1b0d03de..7918d78498f85 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -657,84 +657,6 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
   }
 }
 
-struct WeightInfo {
-  // Weights for current iteration.
-  SmallVector<uint32_t> Weights;
-  // Weights to subtract after each iteration.
-  const SmallVector<uint32_t> SubWeights;
-};
-
-/// Update the branch weights of an exiting block of a peeled-off loop
-/// iteration.
-/// Let F is a weight of the edge to continue (fallthrough) into the loop.
-/// Let E is a weight of the edge to an exit.
-/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to
-/// go to exit.
-/// Then, Estimated ExitCount = F / E.
-/// For I-th (counting from 0) peeled off iteration we set the weights for
-/// the peeled exit as (EC - I, 1). It gives us reasonable distribution,
-/// The probability to go to exit 1/(EC-I) increases. At the same time
-/// the estimated exit count in the remainder loop reduces by I.
-/// To avoid dealing with division rounding we can just multiple both part
-/// of weights to E and use weight as (F - I * E, E).
-static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
-  setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
-  for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
-    if (SubWeight != 0)
-      // Don't set the probability of taking the edge from latch to loop header
-      // to less than 1:1 ratio (meaning Weight should not be lower than
-      // SubWeight), as this could significantly reduce the loop's hotness,
-      // which would be incorrect in the case of underestimating the trip count.
-      Info.Weights[Idx] =
-          Info.Weights[Idx] > SubWeight
-              ? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
-              : SubWeight;
-}
-
-/// Initialize the weights for all exiting blocks.
-static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
-                              Loop *L) {
-  SmallVector<BasicBlock *> ExitingBlocks;
-  L->getExitingBlocks(ExitingBlocks);
-  for (BasicBlock *ExitingBlock : ExitingBlocks) {
-    Instruction *Term = ExitingBlock->getTerminator();
-    SmallVector<uint32_t> Weights;
-    if (!extractBranchWeights(*Term, Weights))
-      continue;
-
-    // See the comment on updateBranchWeights() for an explanation of what we
-    // do here.
-    uint32_t FallThroughWeights = 0;
-    uint32_t ExitWeights = 0;
-    for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
-      if (L->contains(Succ))
-        FallThroughWeights += Weight;
-      else
-        ExitWeights += Weight;
-    }
-
-    // Don't try to update weights for degenerate case.
-    if (FallThroughWeights == 0)
-      continue;
-
-    SmallVector<uint32_t> SubWeights;
-    for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
-      if (!L->contains(Succ)) {
-        // Exit weights stay the same.
-        SubWeights.push_back(0);
-        continue;
-      }
-
-      // Subtract exit weights on each iteration, distributed across all
-      // fallthrough edges.
-      double W = (double)Weight / (double)FallThroughWeights;
-      SubWeights.push_back((uint32_t)(ExitWeights * W));
-    }
-
-    WeightInfos.insert({Term, {std::move(Weights), std::move(SubWeights)}});
-  }
-}
-
 /// Clones the body of the loop L, putting it between \p InsertTop and \p
 /// InsertBot.
 /// \param IterNumber The serial number of the iteration currently being
@@ -1008,11 +930,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
   Instruction *LatchTerm =
       cast<Instruction>(cast<BasicBlock>(Latch)->getTerminator());
 
-  // If we have branch weight information, we'll want to update it for the
-  // newly created branches.
-  DenseMap<Instruction *, WeightInfo> Weights;
-  initBranchWeights(Weights, L);
-
   // Identify what noalias metadata is inside the loop: if it is inside the
   // loop, the associated metadata must be cloned for each iteration.
   SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
@@ -1040,10 +957,20 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
     assert(DT.verify(DominatorTree::VerificationLevel::Fast));
 #endif
 
-    for (auto &[Term, Info] : Weights) {
-      auto *TermCopy = cast<Instruction>(VMap[Term]);
-      updateBranchWeights(TermCopy, Info);
-    }
+    // Do not adjust the branch weights of an exiting block of a peeled-off loop
+    // iteration or of the remaining loop.  Before peeling, once any iteration
+    // is actually reached, the probability of the loop exiting at the
+    // iteration's end is exactly the same across all iterations because there's
+    // only one set of branch weights for them all.  Peeling does not change
+    // those probabilties, so there's no reason to adjust the branch weights.
+    //
+    // Of course, the probability of *reaching* any particular iteration is
+    // logically less than for the previous iteration exactly if the previous
+    // iteration has a non-zero probability of exiting the loop.  In a previous
+    // implementation, that observation was apparently used to justify
+    // decreasing the branch weights across iterations, but all that
+    // accomplishes is corrupting the probabilities relative to the original
+    // loop.
 
     // Remove Loop metadata from the latch branch instruction
     // because it is not the Loop's latch branch anymore.
@@ -1070,10 +997,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
     PHI->setIncomingValueForBlock(NewPreHeader, NewVal);
   }
 
-  for (const auto &[Term, Info] : Weights) {
-    setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
-  }
-
   // Update Metadata for count of peeled off iterations.
   unsigned AlreadyPeeled = 0;
   if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll b/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
index c58f8f1f4e4ee..63a0dd4b4b4f9 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll
@@ -15,9 +15,9 @@ define void @test() {
 ; CHECK:       loop.peel:
 ; CHECK-NEXT:    [[X_PEEL:%.*]] = call i32 @get.x()
 ; CHECK-NEXT:    switch i32 [[X_PEEL]], label [[LOOP_LATCH_PEEL:%.*]] [
-; CHECK-NEXT:    i32 0, label [[LOOP_LATCH_PEEL]]
-; CHECK-NEXT:    i32 1, label [[LOOP_EXIT:%.*]]
-; CHECK-NEXT:    i32 2, label [[LOOP_EXIT]]
+; CHECK-NEXT:      i32 0, label [[LOOP_LATCH_PEEL]]
+; CHECK-NEXT:      i32 1, label [[LOOP_EXIT:%.*]]
+; CHECK-NEXT:      i32 2, label [[LOOP_EXIT]]
 ; CHECK-NEXT:    ], !prof [[PROF0:![0-9]+]]
 ; CHECK:       loop.latch.peel:
 ; CHECK-NEXT:    br label [[LOOP_PEEL_NEXT:%.*]]
@@ -26,10 +26,10 @@ define void @test() {
 ; CHECK:       loop.peel2:
 ; CHECK-NEXT:    [[X_PEEL3:%.*]] = call i32 @get.x()
 ; CHECK-NEXT:    switch i32 [[X_PEEL3]], label [[LOOP_LATCH_PEEL4:%.*]] [
-; CHECK-NEXT:    i32 0, label [[LOOP_LATCH_PEEL4]]
-; CHECK-NEXT:    i32 1, label [[LOOP_EXIT]]
-; CHECK-NEXT:    i32 2, label [[LOOP_EXIT]]
-; CHECK-NEXT:    ], !prof [[PROF1:![0-9]+]]
+; CHECK-NEXT:      i32 0, label [[LOOP_LATCH_PEEL4]]
+; CHECK-NEXT:      i32 1, label [[LOOP_EXIT]]
+; CHECK-NEXT:      i32 2, label [[LOOP_EXIT]]
+; CHECK-NEXT:    ], !prof [[PROF0]]
 ; CHECK:       loop.latch.peel4:
 ; CHECK-NEXT:    br label [[LOOP_PEEL_NEXT1:%.*]]
 ; CHECK:       loop.peel.next1:
@@ -41,31 +41,33 @@ define void @test() {
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[X:%.*]] = call i32 @get.x()
 ; CHECK-NEXT:    switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
-; CHECK-NEXT:    i32 0, label [[LOOP_LATCH]]
-; CHECK-NEXT:    i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
-; CHECK-NEXT:    i32 2, label [[LOOP_EXIT_LOOPEXIT]]
-; CHECK-NEXT:    ], !prof [[PROF2:![0-9]+]]
+; CHECK-NEXT:      i32 0, label [[LOOP_LATCH]]
+; CHECK-NEXT:      i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
+; CHECK-NEXT:      i32 2, label [[LOOP_EXIT_LOOPEXIT]]
+; CHECK-NEXT:    ], !prof [[PROF0]]
 ; CHECK:       loop.latch:
-; CHECK-NEXT:    br label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br label [[LOOP]], !llvm.loop [[LOOP1:![0-9]+]]
 ; CHECK:       loop.exit.loopexit:
 ; CHECK-NEXT:    br label [[LOOP_EXIT]]
 ; CHECK:       loop.exit:
 ; CHECK-NEXT:    ret void
+;
+; DISABLEADV-LABEL: @test(
+; DISABLEADV-NEXT:  entry:
+; DISABLEADV-NEXT:    br label [[LOOP:%.*]]
+; DISABLEADV:       loop:
+; DISABLEADV-NEXT:    [[X:%.*]] = call i32 @get.x()
+; DISABLEADV-NEXT:    switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
+; DISABLEADV-NEXT:      i32 0, label [[LOOP_LATCH]]
+; DISABLEADV-NEXT:      i32 1, label [[LOOP_EXIT:%.*]]
+; DISABLEADV-NEXT:      i32 2, label [[LOOP_EXIT]]
+; DISABLEADV-NEXT:    ], !prof [[PROF0:![0-9]+]]
+; DISABLEADV:       loop.latch:
+; DISABLEADV-NEXT:    br label [[LOOP]]
+; DISABLEADV:       loop.exit:
+; DISABLEADV-NEXT:    ret void
+;
 
-; DISABLEADV-LABEL: @test()
-; DISABLEADV-NEXT: entry:
-; DISABLEADV-NEXT:  br label %loop
-; DISABLEADV: loop
-; DISABLEADV-NEXT:  %x = call i32 @get.x()
-; DISABLEADV-NEXT:  switch i32 %x, label %loop.latch [
-; DISABLEADV-NEXT:    i32 0, label %loop.latch
-; DISABLEADV-NEXT:    i32 1, label %loop.exit
-; DISABLEADV-NEXT:    i32 2, label %loop.exit
-; DISABLEADV-NEXT:  ], !prof !0
-; DISABLEADV: loop.latch:
-; DISABLEADV-NEXT:  br label %loop
-; DISABLEADV: loop.exit:
-; DISABLEADV-NEXT:  ret void
 
 entry:
   br label %loop
@@ -89,9 +91,9 @@ loop.exit:
 
 ;.
 ; CHECK: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
-; CHECK: [[PROF1]] = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
-; CHECK: [[PROF2]] = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
-; CHECK: [[LOOP3]] = distinct !{!3, !4, !5}
-; CHECK: [[META4:![0-9]+]] = !{!"llvm.loop.peeled.count", i32 2}
-; CHECK: [[META5:![0-9]+]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
+; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 2}
+; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
+;.
+; DISABLEADV: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
 ;.



More information about the llvm-commits mailing list