[llvm-branch-commits] [llvm] [LoopUnroll] Fix freqs for unconditional latches: N>2, uniform (PR #182405)

Thu Feb 19 15:56:06 PST 2026

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Joel E. Denny (jdenny-ornl)

<details>
<summary>Changes</summary>

This patch introduces the command-line option `-unroll-uniform-weights`.  When computing probabilities for the remaining N conditional latches in the unrolled loop after converting some iterations' latches to unconditional, LoopUnroll now supports the following three strategies:

- A. If N <= 2, use a simple formula to compute a single uniform probability across those latches.
- B. Otherwise, if `-unroll-uniform-weights` is not specified, apply the original loop's probability to all N latches and then, as needed, adjust as few of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform probability across all N latches.  This patch implements this strategy.

An issue with C is that it could impact compiler performance, so this patch makes it opt-in.  Its appeal over B is that it treats all latches the same given that we have no evidence showing that any latch should have a higher or lower probability than any other.  A has neither problem, but I do not know how to apply it for N > 2.  More experience or feedback from others might determine that some strategies are not worthwhile to maintain.

---

Patch is 42.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182405.diff


3 Files Affected:

- (modified) llvm/lib/Transforms/Utils/LoopUnroll.cpp (+97-15) 
- (modified) llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll (+190-77) 
- (modified) llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll (+49-21) 


``````````diff

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 404e254c8a66f..1028b0f026ea3 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -89,6 +89,11 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,
                     cl::desc("Allow runtime unrolled loops to be unrolled "
                              "with epilog instead of prolog."));
 
+static cl::opt<bool> UnrollUniformWeights(
+    "unroll-uniform-weights", cl::init(false), cl::Hidden,
+    cl::desc("If new branch weights must be found, work harder to keep them "
+             "uniform."));
+
 static cl::opt<bool>
 UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
                     cl::desc("Verify domtree after unrolling"),
@@ -493,9 +498,9 @@ static bool canHaveUnrollRemainder(const Loop *L) {
 //
 // There are often many sets of latch probabilities that can produce the
 // original total loop body frequency.  If there are many remaining conditional
-// latches, this function just quickly hacks a few of their probabilities to
-// restore the original total loop body frequency.  Otherwise, it determines
-// less arbitrary probabilities.
+// latches and !UnrollUniformWeights, this function just quickly hacks a few of
+// their probabilities to restore the original total loop body frequency.
+// Otherwise, it tries harder to determine less arbitrary probabilities.
 static void fixProbContradiction(UnrollLoopOptions ULO,
                                  BranchProbability OriginalLoopProb,
                                  bool CompletelyUnroll,
@@ -580,10 +585,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
       SetProb(I, Prob);
   };
 
-  // If n <= 2, we choose the simplest probability model we can think of: every
-  // remaining conditional branch instruction has the same probability, Prob,
-  // of continuing to the next iteration.  This model has several helpful
-  // properties:
+  // If UnrollUniformWeights or n <= 2, we choose the simplest probability model
+  // we can think of: every remaining conditional branch instruction has the
+  // same probability, Prob, of continuing to the next iteration.  This model
+  // has several helpful properties:
+  // - There is only one search parameter, Prob.
   // - We have no reason to think one latch branch's probability should be
   //   higher or lower than another, and so this model makes them all the same.
   //   In the worst cases, we thus avoid setting just some probabilities to 0 or
@@ -596,14 +602,32 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   //
   //     FreqOne = Sum(i=0..n)(c_i * p^i)
   //
-  // - If the backedge has been eliminated, FreqOne is the total frequency of
-  //   the original loop body in the unrolled loop.
-  // - If the backedge remains, Sum(i=0..inf)(FreqOne * p^(n*i)) =
-  //   FreqOne / (1 - p^n) is the total frequency of the original loop body in
-  //   the unrolled loop, regardless of whether the backedge is conditional or
-  //   unconditional.
+  // - If the backedge has been eliminated:
+  //   - FreqOne is the total frequency of the original loop body in the
+  //     unrolled loop.
+  //   - If Prob == 1, the total frequency of the original loop body is exactly
+  //     the number of remaining loop iterations, as expected because every
+  //     remaining loop iteration always then executes.
+  // - If the backedge remains:
+  //   - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total
+  //     frequency of the original loop body in the unrolled loop, regardless of
+  //     whether the backedge is conditional or unconditional.
+  //   - As Prob approaches 1, the total frequency of the original loop body
+  //     approaches infinity, as expected because the loop approaches never
+  //     exiting.
   // - For n <= 2, we can use simple formulas to solve the above polynomial
   //   equations exactly for p without performing a search.
+  // - For n > 2, evaluating each point in the search space, using ComputeFreq
+  //   below, requires about as few instructions as we could hope for.  That is,
+  //   the probability is constant across the conditional branches, so the only
+  //   computation is across conditional branches and any backedge, as required
+  //   for any model for Prob.
+  // - Prob == 1 produces the maximum possible total frequency for the original
+  //   loop body, as described above.  Prob == 0 produces the minimum, 0.
+  //   Increasing or decreasing Prob monotonically increases or decreases the
+  //   frequency, respectively.  Thus, for every possible frequency, there
+  //   exists some Prob that can produce it, and we can easily use bisection to
+  //   search the problem space.
 
   // When iterating for a solution, we stop early if we find probabilities
   // that produce a Freq whose difference from FreqDesired is small
@@ -611,6 +635,21 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   // accurate (but surely far more accurate).
   const double FreqPrec = 1e-6;
 
+  // Compute the new frequency produced by using Prob throughout CondLatches.
+  auto ComputeFreq = [&](double Prob) {
+    double ProbReaching = 1;        // p^0
+    double FreqOne = IterCounts[0]; // c_0*p^0
+    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
+      ProbReaching *= Prob;                        // p^(I+1)
+      FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
+    }
+    double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
+    assert(FreqOne > 0 && "Expected at least one iteration before first latch");
+    if (ProbReachingBackedge == 1)
+      return std::numeric_limits<double>::infinity();
+    return FreqOne / (1 - ProbReachingBackedge);
+  };
+
   // Compute the probability that, used at CondLaches[0] where
   // CondLatches.size() == 1, gets as close as possible to FreqDesired.
   auto ComputeProbForLinear = [&]() {
@@ -619,6 +658,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
     double B = IterCounts[0] - FreqDesired;
     assert(A > 0 && "Expected iterations after last conditional latch");
     double Prob = -B / A;
+    // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+    // Otherwise, Prob should produce nearly FreqDesired.
+    assert((Prob < 0 || Prob > 1 ||
+            fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+           "Expected accurate frequency when linear case is possible");
     Prob = std::max(Prob, 0.);
     Prob = std::min(Prob, 1.);
     return Prob;
@@ -633,6 +677,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
     double C = IterCounts[0] - FreqDesired;
     assert(A > 0 && "Expected iterations after last conditional latch");
     double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A);
+    // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+    // Otherwise, Prob should produce nearly FreqDesired.
+    assert((Prob < 0 || Prob > 1 ||
+            fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+           "Expected accurate frequency when quadratic case is possible");
     Prob = std::max(Prob, 0.);
     Prob = std::min(Prob, 1.);
     return Prob;
@@ -733,12 +782,17 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   };
 
   // Determine and set branch weights.
+  //
+  // Prob < 0 and Prob > 1 cannot be represented as branch weights.  We might
+  // compute such a Prob if FreqDesired is impossible (e.g., due to inaccurate
+  // profile data) for the maximum trip count we have determined when completely
+  // unrolling.  In that case, so just go with whichever is closest.
   if (CondLatches.size() == 1) {
     SetAllProbs(ComputeProbForLinear());
   } else if (CondLatches.size() == 2) {
     SetAllProbs(ComputeProbForQuadratic());
-  } else {
-    // The polynomial is too complex for a simple formula, so the quick and
+  } else if (!UnrollUniformWeights) {
+    // The polynomial is too complex for a simple formula, and the quick and
     // dirty fix has been selected.  Adjust probabilities starting from the
     // first latch, which has the most influence on the total frequency, so
     // starting there should minimize the number of latches that have to be
@@ -752,6 +806,34 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
       if (fabs(Freq - FreqDesired) < FreqPrec)
         break;
     }
+  } else {
+    // The polynomial is too complex for a simple formula, and uniform branch
+    // weights have been selected, so bisect.
+    double ProbMin, ProbMax, ProbPrev;
+    auto TryProb = [&](double Prob) {
+      ProbPrev = Prob;
+      double FreqDelta = ComputeFreq(Prob) - FreqDesired;
+      if (fabs(FreqDelta) < FreqPrec)
+        return 0;
+      if (FreqDelta < 0) {
+        ProbMin = Prob;
+        return -1;
+      }
+      ProbMax = Prob;
+      return 1;
+    };
+    // If Prob == 0 is too small and Prob == 1 is too large, bisect between
+    // them.  To place a hard upper limit on the search time, stop bisecting
+    // when Prob stops changing (ProbDelta) by much (ProbPrec).
+    if (TryProb(0.) < 0 && TryProb(1.) > 0) {
+      const double ProbPrec = 1e-12;
+      double Prob, ProbDelta;
+      do {
+        Prob = (ProbMin + ProbMax) / 2;
+        ProbDelta = Prob - ProbPrev;
+      } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec);
+    }
+    SetAllProbs(ProbPrev);
   }
 
   // FIXME: We have not considered non-latch loop exits:
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
index 353e74be9fbd1..69da20802a0ae 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
@@ -58,12 +58,14 @@
 ; Check 1 max iteration:
 ; - Unroll count of >=1 should always produce complete unrolling.
 ; - That produces 0 unrolled iteration latches, so there are no branch weights
-;   to compute.
+;   to compute.  Thus, -unroll-uniform-weights has no effect.
 ;
 ; Original loop body frequency is 2 (loop weight 1), which is impossibly high.
 ;
 ;   RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG1210
+;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210
 ;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210
 ;
@@ -77,6 +79,8 @@
 ;
 ;   RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG1110
+;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110
 ;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110
 ;
@@ -90,7 +94,8 @@
 ; Check 2 max iterations:
 ; - Unroll count of >=2 should always produce complete unrolling.
 ; - That produces <=1 unrolled iteration latch, so the implementation can
-;   compute uniform weights by solving, at worst, a linear equation.
+;   compute uniform weights by solving, at worst, a linear equation.  Thus,
+;   -unroll-uniform-weights has no effect.
 ;
 ; Original loop body frequency is 3 (loop weight 2), which is impossibly high.
 ;
@@ -99,6 +104,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2310
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310
 ;
@@ -120,6 +127,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2320
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320
 ;
@@ -140,6 +149,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2210
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210
 ;
@@ -159,6 +170,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2220
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220
 ;
@@ -179,6 +192,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2110
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110
 ;
@@ -198,6 +213,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2120
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120
 ;
@@ -215,7 +232,8 @@
 ; Check 3 max iterations:
 ; - Unroll count of >=3 should always produce complete unrolling.
 ; - That produces <=2 unrolled iteration latches, so the implementation can
-;   compute uniform weights solving, at worst, a quadratic equation.
+;   compute uniform weights solving, at worst, a quadratic equation.  Thus,
+;   -unroll-uniform-weights has no effect.
 ;
 ; Original loop body frequency is 4 (loop weight 3), which is impossibly high.
 ;
@@ -224,6 +242,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3410
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410
 ;
@@ -248,6 +268,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3430
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430
 ;
@@ -269,6 +291,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG343x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x
 ;
@@ -295,6 +319,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3310
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310
 ;
@@ -317,6 +343,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3330
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330
 ;
@@ -338,6 +366,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG333x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x
 ;
@@ -363,6 +393,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3210
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210
 ;
@@ -385,6 +417,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3230
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230
 ;
@@ -406,6 +440,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG323x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x
 ;
@@ -430,6 +466,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3110
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110
 ;
@@ -452,6 +490,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3130
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130
 ;
@@ -473,6 +513,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG313x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x
 ;
@@ -496,6 +538,7 @@
 ; - Unroll count of >=4 should always produce complete unrolling.
 ; - That produces <=3 unrolled iteration latches.  3 is the lowest number where
 ;   the implementation cannot compute uniform weights using a simple formula.
+;   Thus, this is our first case where -unroll-uniform-weights matters.
 ;
 ; Original loop body frequency is 5 (loop weight 4), which is impossibly high.
 ;
@@ -504,6 +547,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4510
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510
 ;
@@ -531,6 +576,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4540
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540
 ;
@@ -554,6 +601,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG454x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-unifo...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/182405